Business Context and Project ScenarioΒΆ
The Bank of England prudentially regulates and supervises financial services firms through the Prudential Regulation Authority (PRA). The PRA are responsible for the prudential regulation and supervision of around 1,500 banks, building societies, credit unions, insurers, and major investment firms. To achieve this, the PRA examines various data sources, some more accessible than others. Quarterly result announcements are a particularly challenging data source to analyse. Traditional data science methods struggle to fully utilise this type of data because of the following two reasons:
- They are unstructured in the form of text and/or video/webcasts.
- They are complex, requiring technical and financial background knowledge.
This project aims to enhance the use of these data sets to improve our risk assessment of individual firms and, in doing so, maintain financial stability.
The Bank of Englandβs RegTech, Data and Innovation team (the βTeamβ) would like to understand if the quarterly result announcements provide additional information or insights on a firm. This includes, but is not limited to, the following:
- Topic modelling and sentiment analysis: Using a mix of text pre-processing and pre-trained language models (e.g. FinBERT and frameworks like BERTopic), is it possible to cluster the key topics raised by industry analysts and the sentiments related to those topics during Q&A sessions with the senior management team (e.g. in the earnings call Q&A transcripts)?
- Information summarisation: Using a mix of text pre-processing and pre-trained language models, can language models be used to extract and summarise key takeaways raised in these transcripts? Some text preprocessing/intermediary language model pipelines would probably be needed before generating summaries themselves so that the summaries are grouped in a manner that makes sense. Example groups could be:
- By topic; for example, your choice of specific issues raised by analysts (two or three topics would do, although you are welcome to cover more ground)
- By specific metrics; for example, a summary of all instances related to metric X, a summary of all instances related to metric Y, etc. (two or three metrics would do, although you are welcome to cover more metrics)
- By speaker; for example, analyst or banker.
The team is also keen to explore additional methods for extracting value from these data sources. This includes new technical approaches using GenAI/language models, as well as innovative ways to analyse and compare data, such as benchmarking a firm against its peers.
As the PRA regulates many institutions, the Team proposes focusing on one or two banks which have been identified in the list of global systemically important banks (G-SIBs).
For this project, call transcripts from JPMorgan Chase were selected foe initial development and assessment of methodologies.
Setup: Libraries & PackagesΒΆ
%pip install bertopic
%pip install prettytable
%pip install scikit-learn
%pip install sentence-transformers
%pip install spacy
%pip install venn
%pip install PyMuPDF
%pip install -U -q PyDrive
!python -m spacy download en_core_web_sm
# standard library imports
import math
import time
import os
import re
import string
from collections import Counter
# third-party imports
import csv
import fitz
import json
import nltk
import matplotlib.colors as mcolors
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.io as pio
import seaborn as sns
import spacy
import tensorflow_hub as hub
import torch
import zipfile
from bertopic import BERTopic
from dateutil import parser
from hdbscan import HDBSCAN
from ipywidgets import Button, HBox, Output
from google.colab import drive, files, userdata
from matplotlib import pyplot as plt
from matplotlib_venn import venn2, venn3
from nltk.corpus import stopwords
from nltk.probability import FreqDist
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
from prettytable import PrettyTable
from scipy.stats import gaussian_kde, linregress, mannwhitneyu, pearsonr
from sentence_transformers import SentenceTransformer
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, mean_squared_error, ConfusionMatrixDisplay
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
from umap import UMAP
from wordcloud import WordCloud
from IPython.display import display, HTML
nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')
Raw Data folder: contains the raw data, financial metrics data
Processed Data folder contains evaluation data sets, cleaned and tabularised data and Phi-3.5 summarised data
Output Data folder: contains files created by various models for analysis
# Get the base repository folder by going one level up from the notebooks folder
base_folder = os.path.abspath(os.path.join(os.getcwd(), "..", "data"))
# Define the path for each subfolder
raw_data_folder = os.path.join(base_folder, "Raw_Data")
processed_data_folder = os.path.join(base_folder, "Processed_Data")
output_data_folder = os.path.join(base_folder, "Output_Data")
# Create the directories if they do not exist
for folder in [raw_data_folder, processed_data_folder, output_data_folder]:
os.makedirs(folder, exist_ok=True)
0 Exploration of Financial Metrics DataΒΆ
metrics_path = raw_data_folder + '/key_financial_metrics_JPMorgan_clean.xlsx'
metrics_df = pd.read_excel(metrics_path)
metrics_df['date'] = pd.to_datetime(metrics_df['date'])
# Getting the unique metric types
unique_metric_types = metrics_df['metric_type'].unique()
# Defining colors for each plot
colors = ['blue', 'green', 'red', 'purple']
# Plotting in a 2x2 grid
fig, axes = plt.subplots(2, 2, figsize=(18, 9))
# Flattening axes array for easy iteration
axes = axes.flatten()
# Looping through metric types and corresponding axes
for i, metric in enumerate(unique_metric_types):
subset = metrics_df[metrics_df['metric_type'] == metric]
axes[i].plot(subset['Q&FY'], subset['metric_value'], marker='o', color=colors[i])
axes[i].set_title(f'Metric: {metric}', fontsize=16, color=colors[i])
axes[i].set_xlabel('Quarter', fontsize=14)
axes[i].set_ylabel('Metric Value', fontsize=14)
axes[i].tick_params(axis='x', labelsize=12, rotation=45)
axes[i].tick_params(axis='y', labelsize=12)
axes[i].grid(visible=True, linestyle='--', alpha=0.5)
# Adjusting the layout
plt.tight_layout()
# Showing the plots
plt.show()
Picking out two interesting dates to explore based on the charts above of key financial metrics:
1Q22
In 1Q22, the bank faced significant financial challenges, marked by a low CET1 capital ratio and declining net income and EPS. This suggests reduced profitability and shareholder returns. The increase in provisions for credit losses suggests heightened caution regarding potential loan defaults. Overall, this quarter indicates a period of financial strain and risk management.
2Q24
In 2Q24, the bank reached peak performance across several key metrics, with its CET1 capital ratio, net income, and EPS all at their highest levels. This suggests a strong capital position, robust profitability, and high returns to shareholders. Provisions for credit losses, however, remained elevated. This indicates a continued cautious approach toward potential credit risks. Overall, this quarter indicates a period of strong financial results although the bank is maintaining caution to guard against possible economic uncertainties or loan defaults.
1 Data Collection and Pre-Processing of TranscriptsΒΆ
Firstly, we downloaded all the earning calls transcripts from 2021Q2 to 2024Q3 from JPMorgan's website, and saved them as a zip file in the raw data folder.
1.1 Extract information from transcripts pdf, put into a table and output as a csvΒΆ
The following code performs batch-processing of PDF files without specifying links to individual documents.
1.1.0 Helper functionsΒΆ
def get_valid_date(text_string):
"""
helper function to check if a string is a valid date
PARAMS:
text_string (str) : some text
RETURNS:
(bool) : whether the text is a valid date
parsed_date (datetime.datetime) : date if detected, None otherwise
"""
try:
parsed_date = parser.parse(text_string)
return True, parsed_date
except (ValueError, OverflowError):
return False, None
def detect_date_in_list_of_strings(text_lines):
"""
detects a valid date given a list of strings
raises an error if a valid date is not detected
PARAMS:
text_lines (list) : list of strings
RETURNS:
date (datetime.datetime) : the detected date if present
"""
try: # expect date to be on the last line
valid_date, date = get_valid_date(text_lines[-1])
except:
for line in text_lines[:-1]:
valid_date, date = get_valid_date(line)
if valid_date:
break
if not valid_date:
raise ValueError(f"Could not find a valid date!")
return date
def detect_pattern(text_lines, pattern, unique=True):
"""
detects the only (unique=True) or all occurrences of a string pattern
given a list of strings
raises an error if the pattern is not detected
PARAMS:
text_lines (list) : list of strings
pattern (str) : a string to be detected in the provided strings
unique (bool) : whether the pattern must be present in exactly one
string in text_lines; defaults to True
RETURNS:
detected_text (str or list) : Text containing the pattern (if present)
If unique is True, there must be only one occurrence - returns a string
If unique is False, returns a list of all occurrences
"""
# expect the pattern to be in the first item
if pattern in text_lines[0]:
return text_lines[0]
detected_texts = [text for text in text_lines if pattern in text]
if unique:
if len(detected_texts)>1:
raise ValueError("More than one occurrence of the pattern detected!")
elif not detected_texts:
raise ValueError("Pattern not detected!")
return detected_texts[0] # return the first (and only) detected text
else:
return detected_texts # return all occurrences (even if empty list)
def remove_punctuation(text):
translator = str.maketrans('', '', string.punctuation)
return text.translate(translator)
def is_valid_word(word):
"""
checks if a word is title case, fully uppercase (acronym) or mixed-case
PARAMS:
word (str)
RETURNS:
bool
"""
return (
word.istitle() or
word.isupper() or
re.match(r'^[A-Z][a-zA-Z0-9]*$', word)
)
def get_all_caps_lines(text_lines):
"""
given a list of strings, returns the strings where all words are title case,
upper case (e.g. acronyms), or mixed-case. Small words, like "a", "or", "for",
are ignored, so "Bank of America" still gets returned
PARAMS:
text_lines (list of strs) : list that contains any sort of strings
RETURNS:
all_caps_lines (list of strs) : list of strings where all words (separated by spaces)
are title-, upper-, or mixed-case
"""
all_caps_lines = []
for line in text_lines:
# remove punctuation
line_clean = remove_punctuation(line)
# remove small words that are typically not capitalised
small_words = [
'a', 'an', 'the', 'and', 'but', 'or', 'for', 'nor', 'so',
'yet', 'at', 'by', 'for', 'from', 'in', 'of', 'on', 'to', 'with'
]
words = line_clean.split()
filtered_words = [word for word in words if word.lower() not in small_words]
line_clean = ' '.join(filtered_words)
# check if all words are capitalised (as in names or titles)
if all(is_valid_word(word) for word in line_clean.split()):
all_caps_lines.append(line)
return all_caps_lines
def remove_preceding_numbers(text):
"""
replaces leading numbers in a string with an empty string
"""
return re.sub(r'^\d+\s*', '', text)
1.1.1 Unzip files and define the path to PDF file folderΒΆ
# Path to the zipped file
zip_file_path = os.path.join(raw_data_folder, "JPMorgan.zip")
# Path to the sub-directory for extracted files
jpm_folder = os.path.join(raw_data_folder, "JPMorgan")
# Create the target folder if it doesn't exist
os.makedirs(jpm_folder, exist_ok=True)
# Unzip the file
if os.path.exists(zip_file_path):
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
zip_ref.extractall(jpm_folder)
print(f"Extracted PDF files from {zip_file_path} to {jpm_folder}")
else:
print(f"{zip_file_path} not found. Please check the file location.")
# Define the paths to the extracted PDF files
paths = [os.path.join(jpm_folder,f) for f in os.listdir(jpm_folder) if f.endswith(".pdf")]
1.1.2 Defining a JPMorgan PDF-to-table converterΒΆ
The following function is tailored to the format of JPMorgan transcripts by assuming that:
- The first page contains date on its own line
- The first page contains the quarter info formatted as
"xQYY" - Speakers are separated by
"................." - Everything said by the operator is preceded by
"Operator: " - There is a
"MANAGEMENT DISCUSSION SECTION"and a"QUESTION AND ANSWER SECTION"titled in all-caps - All words in speaker and company names are capitalised. The only exception is small words like "of", "at", etc.
- All Qs and As start with speaker information formatted like
[name]\n[title, firm]\n[Q/A]
If the above assumptions are met, this function will return a dataframe with the following information:
- uid (str) : informative unique identifier for each row
- bank (str) : bank name (for
process_jpmorgan_pdf, it is always JPMorgan) - year (int) : year discussed in the earnings call (for 4Q22, this is 2022, even though the earnings call took place in January 2023)
- quarter (int) : quarter discussed in the earnings call
- date (datetime) : date the earnings call took place
- section (str) : section the text comes from (
"management_discussion" or "questions_answers") - name (str) : name of the speaker
- title (str) : job title of the speaker
- firm (str) : firm the speaker represents
- qa_type (str) : text type (
"Q","A","N"for questions, answers, and neither respectively) - qa_num_within (int) : number of the question and answer within an earnings call; this can be used to map answers to questions within a call after processing
- qa_num (int) : number of the question and answer across all earnings calls; this can be used to map answers to questions across calls after processing
- qa_text (str) : question or answer text
def process_jpmorgan_pdf(path, q_num_init=0):
doc = fitz.open(path)
print(f"This document consists of {len(doc)} pages")
bank = "JPMorganChase"
df = pd.DataFrame(columns =["uid", "uid_prelim", "bank", "year", "quarter", "date", "section",
"name", "title", "firm", "qa_type", "qa_num_within", "qa_num", "qa_text"])
last_speaker = ""
current_section = ""
q_num_within = 0
q_num = q_num_init # count questions to be able to link As to Qs in case of separate processing
# iterate over pages
for page_num in range(len(doc)):
print(f"Working on page {page_num}...")
page = doc.load_page(page_num) # load a page
text = page.get_text() # extract text
text_stripped = text.strip()
text_lines = text_stripped.split('\n') # split text into a list of strings
if page_num == 0: # first page
date = detect_date_in_list_of_strings(text_lines) # when the report was published
year_published = str(date.year)
try:
expected_pattern = f"Q{year_published[-2:]}"
quarter = detect_pattern(text_lines, expected_pattern, unique=True)
year = int(year_published) # year the report refers to
except: # Q4 are typically released in January, so it refers to the previous year
expected_pattern = f"Q{int(year_published[-2:])-1}"
quarter = detect_pattern(text_lines, expected_pattern, unique=True)
year = int(year_published)-1
if quarter.split(" "):
# if quarter returns several "words", get only the word with the pattern
quarter = detect_pattern(quarter.split(" "), expected_pattern, unique=True)
continue
# process the other pages
text_sections = text_stripped.split("........................................................")
text_sections = [item for item in text_sections if item.strip()] # filter out empty strings
# iterate over sections
for section in text_sections:
if "Operator: " in section:
last_speaker = "Operator"
if "QUESTION AND ANSWER" in section:
current_section = "questions_answers"
continue
if "MANAGEMENT DISCUSSION" in section:
current_section = "management_discussion"
continue
elif "QUESTION AND ANSWER" in section:
current_section = "questions_answers"
continue
# remove leading dots and empty lines
section_nodots = re.sub(r'^\.+', '', section)
lines = [item for item in section_nodots.split("\n") if item.strip()]
if not lines: # empty list
continue
# get lines were all words are capitalised (names, acronyms, "Q " or "A ")
all_caps_lines = get_all_caps_lines(lines)
# define desirable conditions
bool_length = len(all_caps_lines)>=2
# if not all_caps_lines and last_speaker != "Operator":
if not bool_length and last_speaker != "Operator":
# a section starts without anyone being introduced and it is not a
# continuation of Operator's speech means that previous Q/A is continued
# -- this will need to be merged with the preceding row afterwards
name = title = firm = qa_type = ""
qa_text = remove_preceding_numbers(section_nodots).replace('\n', '')
df.at[df.index[-1], 'qa_text'] = df.iloc[-1].qa_text + " " + qa_text
elif bool_length:
name = all_caps_lines[0].strip()
# print(all_caps_lines)
title = all_caps_lines[1].split(",")[0].strip()
firm = all_caps_lines[1].split(",")[1].strip()
# catch cases where three all-caps phrases get picked
# due to lines with just one capitalised word
wrong_catch = False
if len(all_caps_lines)>2:
if all_caps_lines[2].strip() not in ['Q', 'A']:
wrong_catch = True
if len(all_caps_lines)==2 or wrong_catch:
qa_type = "N"
qa_id_within = np.nan
qa_id = np.nan
qa_text = section_nodots.split(firm)[-1].replace('\n', '').strip()
else:
qa_type = all_caps_lines[2].strip()
if qa_type == "Q" and name != last_speaker:
q_num_within += 1
q_num += 1
qa_id_within = q_num_within
qa_id = q_num
# # if an executive answers before the analyst asks a question,
# # we do not want to count it as an answer to the earlier question
# if qa_type == "A" and last_speaker=="Operator":
# qa_id_within = np.nan
# qa_id = np.nan
# # commenting this out because sometimes exeuctives continue
# # providing useful information even after the operator intervenes
str_splitter = section_nodots.split(qa_type)[0]
qa_text = section_nodots.split(str_splitter)[-1][2:].replace('\n', '').strip()
if qa_text=="": # try to catch PyMuPDF formatting errors!
if current_section=='questions_answers' and lines[0].strip()==name and not lines[2].strip()==qa_type:
qa_text = " ".join([item for item in lines if item not in all_caps_lines])
# define an informative unique identified
prelim_uid = f"{bank}_{quarter}_{qa_type}_{q_num_within}"
num_prelim_uids = (df.uid_prelim == prelim_uid).sum()
uid = prelim_uid + f".{num_prelim_uids}"
# check if it is not the same speaker continuing on a new page
if name == last_speaker:
df.at[df.index[-1], 'qa_text'] = df.iloc[-1].qa_text + " " + qa_text
# elif qa_id_within==0: # can happen if Q&A is started off by an executive
# pass
else:
df_to_append = pd.DataFrame({
"uid": [uid],
"uid_prelim": [prelim_uid],
"bank": [bank],
"year": [year],
"quarter": [int(quarter[0])],
"date": [date],
"section": [current_section],
"name": [name],
"title": [title],
"firm": [firm],
"qa_type": [qa_type],
"qa_num_within" : [qa_id_within],
"qa_num": [qa_id],
"qa_text": [qa_text]
})
if not df_to_append.isna().all(axis=1).any():
df = pd.concat((df, df_to_append), ignore_index=True)
last_speaker = name
else:
print("Excluding:\n", section_nodots)
continue
# raise ValueError("Failed!")
df = df.drop(["uid_prelim"], axis=1)
return df
1.1.3 Defining a wrapper function to batch-process all PDF files in the folderΒΆ
Since each bank will need a dedicated transcript processing function, this wrapper function asks the user to input the bank they wish to process.
All files in the supplied pathlist must belong to the same bank. Otherwise, the wrapper function returns an error.
All information from the target bank is concatenated in a single csv file called "transcripts_tabular_{bank}.csv"
def transcript_pdf_to_csv(pathlist, save_folder):
banks = ['JPMorgan']
print("Which bank are you seeking to process?")
bank = input(banks)
print("Do all paths in `pathlist` lead to files from this bank?")
confirmation = input(['yes', 'no'])
if confirmation != 'yes':
raise ValueError("All paths must lead to the files from the chosen bank!")
df_all_transcripts = pd.DataFrame(columns =["uid", "bank", "year", "quarter", "date", "section",
"name", "title", "firm", "qa_type", "qa_num_within", "qa_num", "qa_text"])
for path in pathlist:
q_num_init = 0 if pd.isna(df_all_transcripts.qa_num.max()) else df_all_transcripts.qa_num.max()
if bank == 'JPMorgan':
df = process_jpmorgan_pdf(path, q_num_init)
df_all_transcripts = pd.concat((df_all_transcripts, df), ignore_index=True)
df_all_transcripts.to_csv(os.path.join(save_folder, f"transcripts_tabular_{bank}.csv"), index=False)
1.1.4 Converting all PDF files in the folder to a single csv tableΒΆ
Running the function prints all information that gets excluded. In case of JPMorgan, it excludes the disclaimers on the last page of all transcripts.
transcript_pdf_to_csv(pathlist=paths, save_folder=processed_data_folder)
1.1.5 OutputΒΆ
df_jpm = pd.read_csv(processed_data_folder + "/transcripts_tabular_JPMorgan.csv")
df_jpm.head()
| uid | bank | year | quarter | date | section | name | title | firm | qa_type | qa_num_within | qa_num | qa_text | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | JPMorganChase_3Q24_N_0.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | management_discussion | Jeremy Barnum | Chief Financial Officer | JPMorgan Chase & Co. | N | NaN | NaN | Thank you and good morning, everyone. Starting... |
| 1 | JPMorganChase_3Q24_Q_1.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | questions_answers | Jim Mitchell | Analyst | Seaport Global Securities LLC | Q | 1.0 | 1.0 | Hey, good morning. So, Jeremy, as you highligh... |
| 2 | JPMorganChase_3Q24_A_1.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | questions_answers | Jeremy Barnum | Chief Financial Officer | JPMorgan Chase & Co. | A | 1.0 | 1.0 | Yeah. Sure, Jim. I'll try to answer both quest... |
| 3 | JPMorganChase_3Q24_Q_2.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | questions_answers | Jim Mitchell | Analyst | Seaport Global Securities LLC | Q | 2.0 | 2.0 | All right. Thanks a lot. |
| 4 | JPMorganChase_3Q24_A_2.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | questions_answers | Jeremy Barnum | Chief Financial Officer | JPMorgan Chase & Co. | A | 2.0 | 2.0 | Thanks. |
1.1.6 Checking the dataΒΆ
def check_dataset(df):
print('########## Confirm that all text rows contain text ##########')
print(f"There are {df.qa_text.isna().sum()} empty text rows")
print('\n')
print('########## Check whether categorical rows have the expected value range ##########')
categoricals = ['bank', 'section', 'name', 'title', 'firm', 'qa_type']
for cat in categoricals:
print(f"{cat}: {df[cat].unique()}")
print('\n')
print('########## Check that numerical columns have the expected value range ##########')
display(df.describe())
print('\n')
print('########## Confirm that all uids are unique ##########')
print(f"There are {df.uid.unique().shape[0]} unique uids and {df.shape[0]} rows in the dataset")
print('\n')
check_dataset(df_jpm)
########## Confirm that all text rows contain text ########## There are 0 empty text rows ########## Check whether categorical rows have the expected value range ########## bank: ['JPMorganChase'] section: ['management_discussion' 'questions_answers'] name: ['Jeremy Barnum' 'Jim Mitchell' 'Steven Chubak' 'Jamie Dimon' 'Erika Najarian' 'Glenn Schorr' 'Gerard Cassidy' "Matt O'Connor" 'Mike Mayo' 'Ebrahim H. Poonawala' 'Betsy L. Graseck' 'Saul Martinez' 'Ken Usdin' 'John McDonald' 'Matt OβConnor' 'Charles W. Peabody' 'Manan Gosalia' 'Ryan Kenny' 'John E. McDonald' 'Andrew Lim' 'James Mitchell' 'Ebrahim Poonawala' 'Charles Peabody' 'Kenneth M. Usdin'] title: ['Chief Financial Officer' 'Analyst' 'Chairman & Chief Executive Officer' 'Chief Financial Officer & Member-Operating Committee'] firm: ['JPMorgan Chase & Co.' 'Seaport Global Securities LLC' 'Wolfe Research LLC' 'UBS Securities LLC' 'Evercore ISI' 'RBC Capital Markets LLC' 'Deutsche Bank Securities' 'Wells Fargo Securities LLC' 'Bank of America Merrill Lynch' 'Morgan Stanley & Co. LLC' 'HSBC Securities (USA)' 'Jefferies LLC' 'Autonomous Research' 'Portales Partners LLC' 'SociΓ©tΓ© GΓ©nΓ©rale' 'Seaport Global Securities' 'Portales Partners' 'Jefferies & Company' 'BofA Securities' 'UBS' 'Seaport Research Partners' 'SociΓ©tΓ© GΓ©nΓ©rale SA (UK)'] qa_type: ['N' 'Q' 'A'] ########## Check that numerical columns have the expected value range ##########
| year | quarter | qa_num_within | qa_num | |
|---|---|---|---|---|
| count | 1273.000000 | 1273.000000 | 1242.000000 | 1242.000000 |
| mean | 2022.359780 | 2.456402 | 21.553140 | 292.210950 |
| std | 1.042049 | 1.043786 | 12.453459 | 165.198839 |
| min | 2021.000000 | 1.000000 | 0.000000 | 1.000000 |
| 25% | 2021.000000 | 2.000000 | 11.000000 | 152.000000 |
| 50% | 2022.000000 | 2.000000 | 21.000000 | 298.500000 |
| 75% | 2023.000000 | 3.000000 | 31.000000 | 437.750000 |
| max | 2024.000000 | 4.000000 | 49.000000 | 564.000000 |
########## Confirm that all uids are unique ########## There are 1273 unique uids and 1273 rows in the dataset
1.2 Data CleaningΒΆ
In the transcripts, there are situations where interruptions occured. This is reflected as '...' in the texts. Furthermore, at the end of some of the analysts questions, there are brief conversations of greetings. These do not provide any information and should be removed
# load the csv from the previous step
data = pd.read_csv(processed_data_folder + "/transcripts_tabular_JPMorgan.csv")
1.2.0 Helper functions / Load dataΒΆ
def print_row(row, uids):
if row['uid'] in uids:
print(f"\033[31m{row['qa_type']} ({row['name']}): {row['qa_text']}\033[0m")
else:
print(f"{row['qa_type']} ({row['name']}): {row['qa_text']}")
def print_multi_question(data, qa_num, uids):
data_qa = data[data['qa_num'] == qa_num]
data_qa.apply(lambda x: print_row(x, uids), axis=1)
def print_single_question(data, qa_num, uid):
data_qa = data[data['qa_num'].isin([qa_num, qa_num-1, qa_num+1])]
data_qa.apply(lambda x: print_row(x, [uid]), axis=1)
def get_action_for_multi(uids):
print_multi_question(data, num, uids)
action = input("Do you want to consider them together?\n")
return action
def get_action_for_single(uid):
pass
def split_string_by_punctuation(text):
punct_regex = r"(?=\S)(?:i.e.|J.P.|U.S.|ex.|[A-Z][a-z]{0,3}\.|[^.?!]|\.(?!\s+[A-Z]))*.?"
return re.findall(punct_regex, text)
def remove_sentence_with_three_dots(text):
sentences = split_string_by_punctuation(text)
if text.startswith('...') or text.startswith('β¦'):
sentences = sentences[1:]
if text.endswith('...') or text.endswith('β¦'):
sentences = sentences[:-1]
return ' '.join(sentences)
1.2.1 Dealing with interruptionsΒΆ
Identify where interruptions occurs in transcripts by searching for 'β¦' or '...'.
Then go through each instance, ask whether to delete the entire row ('delete all'), delete the sentence containing the triple dots ('delete part') or keep it.
If there are two instances identified in within the same Q/A, there's an extra option of merging the two rows and delete everything in between ('merge'), corresponds to the case where an unimportant interruption happened during someone's speech.
def clean_interruptions(df):
# identify interruptions by search for triple dots
df['qa_text'] = df['qa_text'].astype(str)
df['interruption'] = df.apply(lambda x: x['qa_text'].find('...')!=-1 or x['qa_text'].find('β¦')!=-1, axis=1)
interruption_num = df[df['interruption'] == True]['qa_num'].unique()
# create a dictionary of {qa_num: [list of uids]}
interruption_dict = {num: df[(df['interruption'] == True) & (df['qa_num'] == num)]['uid'].tolist() for num in interruption_num}
# create an interface to display interruptions and ask for actions
out = Output(layout={'width': '50em'})
display(out)
with out:
for num in interruption_num:
uids = interruption_dict[num]
out.clear_output()
# deal with the special case of 2 interruptions within one Q/A
if len(uids) == 2:
print_multi_question(df, num, uids)
action = input("Do you want to consider them together? \n")
if action.lower() == "yes":
action = input("What's your action? \nType 'merge' to combine the two rows and delete anything in between. \nType 'keep' to do nothing. \n")
if action == 'merge':
# append the text from the second row to the first row
df.loc[df['uid'] == uids[0], 'qa_text'] += ' ' + df.loc[df['uid'] == uids[1], 'qa_text'].values
# delete rows in between
start_index = df.index[df['uid'] == uids[0]][0]
end_index = df.index[df['uid'] == uids[1]][0]
index_to_delete = range(start_index+1, end_index+1)
df = df.drop(index_to_delete).reset_index(drop=True)
else:
for uid in uids:
out.clear_output()
print_single_question(df, num, uid)
action = input("What's your action? \nType 'delete all' to delete the entire row. \nType 'delete part' to remove the sentence containing the triple dots. \nType 'keep' to do nothing. \n")
if action == 'delete all':
df = df[df['uid'] != uid]
elif action == 'delete part':
df.loc[df['uid'] == uid, "qa_text"] = df.loc[df['uid'] == uid, "qa_text"].apply(remove_sentence_with_three_dots)
else:
for uid in uids:
out.clear_output()
print_single_question(df, num, uid)
action = input("What's your action? \nType 'delete all' to delete the entire row. \nType 'delete part' to remove the sentence containing the triple dots. \nType 'keep' to do nothing. \n")
if action == 'delete all':
df = df[df['uid'] != uid]
elif action == 'delete part':
df.loc[df['uid'] == uid, "qa_text"] = df.loc[df['uid'] == uid, "qa_text"].apply(remove_sentence_with_three_dots)
df.drop(columns=['interruption'], inplace=True)
return df
1.2.2 Deal with short textsΒΆ
Identify rows that are short (word count <= 5). A large amount of these will be greetings recorded as messages.
The program goes through each instance, and ask whether to delete or keep the row.
def clean_short(df):
# identify short texts that contain at most 5 words
df['short'] = df.apply(lambda x: len(x['qa_text'].split(' ')) <= 5, axis=1)
short_num = df[df['short'] == True]['qa_num'].unique()
# create a dictionary of {qa_num: [list of uids]}
short_dict = {num: df[(df['short'] == True) & (df['qa_num'] == num)]['uid'].tolist() for num in short_num}
out = Output(layout={'width': '50em'})
display(out)
with out:
for num in short_num:
uids = short_dict[num]
out.clear_output()
for uid in uids:
out.clear_output()
print_single_question(df, num, uid)
action = input("Do you want to delete this line? \nType 'yes' or 'no'. \n")
if action=='yes':
df = df[df['uid'] != uid]
df.drop(columns=['short'], inplace=True)
return df
1.2.3 Deal with hanging Q or AΒΆ
Check for any hanging Q or A (Question or Answer that without their counterpart).
Then the program goes through each instance, ask whether to delete the row or merge to the previous Q/A. In the case of a lone answer, it will be appended to the end of the previous Q/A as another answer. In the case of a lone question, it will be appended to the start of the next Q/A.
def clean_hanging(df):
df.reset_index(drop=True, inplace=True)
# getting all unique qa_num from the dataframe, then remove nan
qa_num = pd.unique(df['qa_num'])
qa_num = qa_num[~pd.isnull(qa_num)]
# loop through each question number, if there are less than 2 rows with such question number,
# then we have a hanging Q/A
hanging_num = []
for num in qa_num:
if df[df['qa_num'] == num].shape[0] < 2:
hanging_num.append(num)
hanging_dict = {num: df[df['qa_num'] == num]['uid'].tolist() for num in hanging_num}
out = Output(layout={'width': '50em'})
display(out)
with out:
for num in hanging_num:
for uid in hanging_dict[num]:
out.clear_output()
print_single_question(df, num, uid)
action = input("Do you want to merge or delete? \nType 'merge' to merge to the previous/next A/Q. \nType 'delete' to remove the row.\n")
if action == 'merge':
if df[df['uid'] == uid]['qa_type'].values[0] == 'Q':
# append to the next question
index = df[df['uid'] == uid].index[0]
next_qa_num = df.loc[index+1, 'qa_num']
next_qa_num_within = df.loc[index+1, 'qa_num_within']
df.loc[index, 'qa_num'] = next_qa_num
df.loc[index, 'qa_num_within'] = next_qa_num_within
elif df[df['uid'] == uid]['qa_type'].values[0] == 'A':
# append to the previous answer
index = df[df['uid'] == uid].index[0]
prev_qa_num = df.loc[index-1, 'qa_num']
prev_qa_num_within = df.loc[index-1, 'qa_num_within']
df.loc[index, 'qa_num'] = prev_qa_num
df.loc[index, 'qa_num_within'] = prev_qa_num_within
elif action == 'delete':
df = df[df['uid'] != uid]
return df
1.2.4 Update the uid to reflect changes in question numberΒΆ
def update_uid(df):
df.loc[~data['qa_num'].isna(), "uid"] = (
df[~data['qa_num'].isna()]
.assign(uid=lambda x: x.groupby('qa_num').cumcount() + 1)
.assign(uid=lambda x: [f"{bank}_{int(quarter)}Q{int(year-2000)}_{qa_type}_{uid}.0" for bank,quarter,year,qa_type,uid in zip(x['bank'], x['quarter'], x['year'], x['qa_type'], x['uid'])])
)
return df
1.2.5 Running the cleaning process and save the cleaned dataΒΆ
data = clean_interruptions(data)
data = clean_short(data)
data = clean_hanging(data)
data = update_uid(data)
# save the data
data.to_csv(processed_data_folder + "/transcripts_tabular_JPMorgan_clean.csv", index=False)
1.2.6 Test on subset of dataset (if needed)ΒΆ
# load the uncleaned dataset, select the first 146 rows (corresopnd to the latest 2 quarters)
test_data = pd.read_csv(processed_data_folder+"/transcripts_tabular_JPMorgan.csv")
test_data = test_data.head(146)
# manually add a row to showcase merging in same question
new_row = {'uid':'Extra', 'qa_type':'A', 'qa_num_within':4, 'qa_num':4, 'qa_text':'Random Insert' }
insert_index = 10
test_data = pd.concat([test_data.iloc[:insert_index], pd.DataFrame([new_row]), test_data.iloc[insert_index:]]).reset_index(drop=True)
test_data = clean_interruptions(test_data)
test_data = clean_short(test_data)
test_data = clean_hanging(test_data)
test_data = update_uid(test_data)
test_data
1.3 Initial Exploratory Data AnalysisΒΆ
1.3.1 Transcripts overviewΒΆ
# Defining a function to preprocess text
def preprocess_text(text):
# Checking if the input is a string
if not isinstance(text, str):
return ""
# Converting the text to lowercase
text = text.lower()
# Removing punctuation from the text (including special quotation marks and apostrophes)
text = re.sub(r"[^\w\s]", '', text)
# Removing numbers from the text
text = re.sub(r'\d+', '', text)
# Tokenizing the text
tokens = word_tokenize(text)
# Removing stopwords from the text
stop_words = set(stopwords.words('english'))
tokens = [word for word in tokens if word not in stop_words]
# Lemmatizing the words
lemmatizer = WordNetLemmatizer()
tokens = [lemmatizer.lemmatize(word) for word in tokens]
return tokens
# load the cleaned dataset
transcripts_df = pd.resad_csv(processed_data_folder + "/transcripts_tabular_JPMorgan_clean.csv")
# Preprocessing the 'qa_text' column in the transcripts data
transcripts_df['qa_text_processed'] = transcripts_df['qa_text'].apply(preprocess_text)
# Viewing the preprocessed data
transcripts_df[['qa_text', 'qa_text_processed']].head()
| qa_text | qa_text_processed | |
|---|---|---|
| 0 | Thank you and good morning, everyone. Starting... | [thank, good, morning, everyone, starting, pag... |
| 1 | Hey, good morning. So, Jeremy, as you highligh... | [hey, good, morning, jeremy, highlighted, full... |
| 2 | Yeah. Sure, Jim. I'll try to answer both quest... | [yeah, sure, jim, ill, try, answer, question, ... |
| 3 | Hi. Good morning. So Jeremy, how are you? So I... | [hi, good, morning, jeremy, want, ask, expense... |
| 4 | Sure. So good question and I agree with your n... | [sure, good, question, agree, number, agree, w... |
# Pulling all words from the 'qa_text_processed' column into a list
transcripts_all_words = [word for tokens in transcripts_df['qa_text_processed'] for word in tokens]
# Calculating the frequency distribution of the words
transcripts_freq_dist = FreqDist(transcripts_all_words)
# Getting the top 10 words and their frequencies
transcripts_top_10 = transcripts_freq_dist.most_common(10)
transcripts_words, transcripts_counts = zip(*transcripts_top_10)
# Plotting the top 10 words in a barplot
plt.figure(figsize=(12, 6))
sns.barplot(x=list(transcripts_words), y=list(transcripts_counts), palette="viridis")
plt.title('Top 10 Words in the Transcripts')
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.xticks(rotation=45)
plt.show()
# Combining the tokens from the 'qa_text_processed' column
transcripts_text = ' '.join([' '.join(tokens) for tokens in transcripts_df['qa_text_processed']])
# Generating the word cloud for the text
transcripts_wordcloud = WordCloud(width=800, height=400, background_color='white').generate(transcripts_text)
# Plotting the word cloud
plt.figure(figsize=(10, 5))
plt.imshow(transcripts_wordcloud, interpolation='bilinear')
plt.title('Word Cloud for the Transcripts')
plt.axis('off')
plt.show()
Many of the words that appear are not hugely informative. As such, I will preprocess the text again removing extra words.
# Defining a function to preprocess text
def preprocess_text(text):
# Checking if the input is a string
if not isinstance(text, str):
return ""
# Converting the text to lowercase
text = text.lower()
# Removing punctuation from the text (including special quotation marks and apostrophes)
text = re.sub(r"[^\w\s]", '', text)
# Removing numbers from the text
text = re.sub(r'\d+', '', text)
# Tokenizing the text
tokens = word_tokenize(text)
# Defining stopwords and adding custom words
stop_words = set(stopwords.words('english'))
custom_stop_words = {"think", "going", "thats", "like", "bit", "thing", "yeah",
"see", "would", "youre", "question", "could", "dont",
"stuff", "jeremy", "lot", "betsy", "teresa", "michael",
"hi", "hey", "hello", "maybe", "jamie", "go", "weve"}
stop_words.update(custom_stop_words)
# Removing stopwords from the text
tokens = [word for word in tokens if word not in stop_words]
# Lemmatizing the words
lemmatizer = WordNetLemmatizer()
tokens = [lemmatizer.lemmatize(word) for word in tokens]
return tokens
# Preprocessing the 'qa_text' column in the transcripts data
transcripts_df['qa_text_processed'] = transcripts_df['qa_text'].apply(preprocess_text)
# Viewing the preprocessed data
transcripts_df[['qa_text', 'qa_text_processed']].head()
| qa_text | qa_text_processed | |
|---|---|---|
| 0 | Thank you and good morning, everyone. Starting... | [thank, good, morning, everyone, starting, pag... |
| 1 | Hey, good morning. So, Jeremy, as you highligh... | [good, morning, highlighted, full, year, nii, ... |
| 2 | Yeah. Sure, Jim. I'll try to answer both quest... | [sure, jim, ill, try, answer, question, togeth... |
| 3 | Hi. Good morning. So Jeremy, how are you? So I... | [good, morning, want, ask, expense, light, com... |
| 4 | Sure. So good question and I agree with your n... | [sure, good, agree, number, agree, way, youve,... |
# Pulling all words from the 'qa_text_processed' column into a list
transcripts_all_words = [word for tokens in transcripts_df['qa_text_processed'] for word in tokens]
# Calculating the frequency distribution of the words
transcripts_freq_dist = FreqDist(transcripts_all_words)
# Getting the top 10 words and their frequencies
transcripts_top_10 = transcripts_freq_dist.most_common(10)
transcripts_words, transcripts_counts = zip(*transcripts_top_10)
# Plotting the top 10 words in a barplot
plt.figure(figsize=(12, 6))
sns.barplot(x=list(transcripts_words), y=list(transcripts_counts), palette="viridis")
plt.title('Top 10 Words in the Transcripts')
plt.xlabel('Words')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()
# Combining the tokens from the 'qa_text_processed' column
transcripts_text = ' '.join([' '.join(tokens) for tokens in transcripts_df['qa_text_processed']])
# Generating the word cloud for the text
transcripts_wordcloud = WordCloud(width=800, height=400, background_color='white').generate(transcripts_text)
# Plotting the word cloud
plt.figure(figsize=(10, 5))
plt.imshow(transcripts_wordcloud, interpolation='bilinear')
plt.title('Word Cloud for the Transcripts')
plt.axis('off')
plt.show()
1.3.2 1Q22 and 2Q24ΒΆ
Now, let's identify two intersting quarters to investigate based on the financial metrics.
# Getting the unique metric types
unique_metric_types = metrics_df['metric_type'].unique()
# Defining colors for each plot
colors = ['blue', 'green', 'red', 'purple']
# Plotting in a 2x2 grid
fig, axes = plt.subplots(2, 2, figsize=(18, 9))
# Flattening axes array for easy iteration
axes = axes.flatten()
# Looping through metric types and corresponding axes
for i, metric in enumerate(unique_metric_types):
subset = metrics_df[metrics_df['metric_type'] == metric]
axes[i].plot(subset['Q&FY'], subset['metric_value'], marker='o', color=colors[i])
axes[i].set_title(f'Metric: {metric}', fontsize=16, color=colors[i])
axes[i].set_xlabel('Quarter', fontsize=14)
axes[i].set_ylabel('Metric Value', fontsize=14)
axes[i].tick_params(axis='x', labelsize=12, rotation=45)
axes[i].tick_params(axis='y', labelsize=12)
axes[i].grid(visible=True, linestyle='--', alpha=0.5)
# Adjusting the layout
plt.tight_layout()
# Showing the plots
plt.show()
Picking out two interesting dates to explore based on the charts above of key financial metrics:
1Q22
In 1Q22, the bank faced significant financial challenges, marked by a low CET1 capital ratio and declining net income and EPS. This suggests reduced profitability and shareholder returns. The increase in provisions for credit losses suggests heightened caution regarding potential loan defaults. Overall, this quarter indicates a period of financial strain and risk management.
2Q24
In 2Q24, the bank reached peak performance across several key metrics, with its CET1 capital ratio, net income, and EPS all at their highest levels. This suggests a strong capital position, robust profitability, and high returns to shareholders. Provisions for credit losses, however, remained elevated. This indicates a continued cautious approach toward potential credit risks. Overall, this quarter indicates a period of strong financial results although the bank is maintaining caution to guard against possible economic uncertainties or loan defaults.
# Converting the date column to datetime
transcripts_df['date'] = pd.to_datetime(transcripts_df['date'])
# Creating seperate dataframes for the Apr-22 and Jul-24 Q&As
apr_22 = transcripts_df[(transcripts_df['date'] >= '2022-04-01') & (transcripts_df['date'] < '2022-05-01')]
jul_24 = transcripts_df[(transcripts_df['date'] >= '2024-07-01') & (transcripts_df['date'] < '2024-08-01')]
# Pulling all words from qa_text_processed column into a list
apr_22_all_words = [word for tokens in apr_22['qa_text_processed'] for word in tokens]
jul_24_all_words = [word for tokens in jul_24['qa_text_processed'] for word in tokens]
# Calculating the frequency distribution of the words from each dataset
apr_22_freq_dist = FreqDist(apr_22_all_words)
jul_24_freq_dist = FreqDist(jul_24_all_words)
# Getting the top 10 words and their frequencies from the Apr-22 Q&As
apr_22_top_10 = apr_22_freq_dist.most_common(10)
apr_22_words, apr_22_counts = zip(*apr_22_top_10)
# Getting the top 10 words and their frequencies from the Jul-24 Q&As
jul_24_top_10 = jul_24_freq_dist.most_common(10)
jul_24_words, jul_24_counts = zip(*jul_24_top_10)
# Creating a figure with 1 row and 2 columns of subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))
# Plotting the top 10 words in a barplot for the Apr-22 Q&As
sns.barplot(x=list(apr_22_words), y=list(apr_22_counts), palette="viridis", ax=ax1)
ax1.set_title('Top 10 Words in the April 2022 Q&As')
ax1.set_xlabel('Words')
ax1.set_ylabel('Frequency')
ax1.set_xticklabels(ax1.get_xticklabels(), rotation=45)
# Plotting the top 10 words in a barplot for the Jul-24 Q&As
sns.barplot(x=list(jul_24_words), y=list(jul_24_counts), palette="viridis", ax=ax2)
ax2.set_title('Top 10 Words in the July 2024 Q&As')
ax2.set_xlabel('Words')
ax2.set_ylabel('Frequency')
ax2.set_xticklabels(ax2.get_xticklabels(), rotation=45)
# Adjusting the layout
plt.tight_layout()
# Showing the plots
plt.show()
# Creating a dataframe excluding Apr-22 Q&As
not_apr_22 = transcripts_df[(transcripts_df['date'] < '2022-04-01') | (transcripts_df['date'] >= '2022-05-01')]
# Creating a dataframe excluding Jul-24 Q&As
not_jul_24 = transcripts_df[(transcripts_df['date'] < '2024-07-01') | (transcripts_df['date'] >= '2024-08-01')]
# Getting unique words for the Apr-22 Q&As
apr_22_unique_words = set(apr_22_freq_dist.keys()) - set(not_apr_22['qa_text_processed'].explode())
apr_22_unique_counts = {word: apr_22_freq_dist[word] for word in apr_22_unique_words}
apr_22_unique_df = pd.DataFrame(apr_22_unique_counts.items(), columns=['word', 'apr_22_frequency'])
# Getting unique words for the Jul-24 Q&As
jul_24_unique_words = set(jul_24_freq_dist.keys()) - set(not_jul_24['qa_text_processed'].explode())
jul_24_unique_counts = {word: jul_24_freq_dist[word] for word in jul_24_unique_words}
jul_24_unique_df = pd.DataFrame(jul_24_unique_counts.items(), columns=['word', 'jul_24_frequency'])
# Getting the top 10 unique words for Apr-22 and Jul-24
apr_22_top_words = apr_22_unique_df.nlargest(10, 'apr_22_frequency')
jul_24_top_words = jul_24_unique_df.nlargest(10, 'jul_24_frequency')
# Setting up the plot
plt.figure(figsize=(14, 6))
# Creating a bar plot for the top 10 Apr-22 exclusive words
plt.subplot(1, 2, 1)
sns.barplot(data=apr_22_top_words, x='word', y='apr_22_frequency', palette='Blues')
plt.title('Top 10 Unique Words Used in the April 2022 Q&As')
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.xticks(rotation=45)
# Creating a bar plot for the top 10 Jul-24 exclusive words
plt.subplot(1, 2, 2)
sns.barplot(data=jul_24_top_words, x='word', y='jul_24_frequency', palette='Oranges')
plt.title('Top 10 Unique Words Used in the July 2024 Q&As')
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.xticks(rotation=45)
# Adjusting the layout
plt.tight_layout()
# Showing the plots
plt.show()
In 1Q22, the term βrussia-associated,β is likely tied to the Russia-Ukraine conflict. Similarly, the word βnickelβ may indicate exposure to commodity price volatility, possibly linked to the Russia-Ukraine conflict.
In 2Q24, words like βindexβ and βgovernorβ indicate attention to macroeconomic indicators and regulatory guidance.
1.3.3 Emerging risksΒΆ
To exlpore emerging risks, let's focus on the last two quarters.
# Getting the unique metric types
unique_metric_types = metrics_df['metric_type'].unique()
# Defining colors for each plot
colors = ['blue', 'green', 'red', 'purple']
# Plotting in a 2x2 grid
fig, axes = plt.subplots(2, 2, figsize=(18, 9))
# Flattening axes array for easy iteration
axes = axes.flatten()
# Looping through metric types and corresponding axes
for i, metric in enumerate(unique_metric_types):
subset = metrics_df[metrics_df['metric_type'] == metric]
axes[i].plot(subset['Q&FY'], subset['metric_value'], marker='o', color=colors[i])
axes[i].set_title(f'Metric: {metric}', fontsize=16, color=colors[i])
axes[i].set_xlabel('Quarter', fontsize=14)
axes[i].set_ylabel('Metric Value', fontsize=14)
axes[i].tick_params(axis='x', labelsize=12, rotation=45)
axes[i].tick_params(axis='y', labelsize=12)
axes[i].grid(visible=True, linestyle='--', alpha=0.5)
# Adjusting the layout
plt.tight_layout()
# Showing the plots
plt.show()
# Creating a dataframs for the 2Q24 and 3Q24 Q&As
Q23_FY24 = transcripts_df[(transcripts_df['date'] >= '2024-07-01') & (transcripts_df['date'] < '2024-11-01')]
# Pulling all words from qa_text_processed column into a list
Q23_FY24_all_words = [word for tokens in Q23_FY24['qa_text_processed'] for word in tokens]
# Calculating the frequency distribution of the words from the dataset
Q23_FY24_freq_dist = FreqDist(Q23_FY24_all_words)
# Getting the top 10 words and their frequencies from the 2Q24 and 3Q24 Q&As
Q23_FY24_top_10 = Q23_FY24_freq_dist.most_common(10)
Q23_FY24_words, Q23_FY24_counts = zip(*Q23_FY24_top_10)
# Setting up the plot
fig, ax1 = plt.subplots(figsize=(14, 6))
# Plotting the top 10 words in a barplot for the 2Q24 and 3Q24 Q&As
sns.barplot(x=list(Q23_FY24_words), y=list(Q23_FY24_counts), palette="viridis", ax=ax1)
ax1.set_title('Top 10 Words in the 2Q24 and 3Q24 Q&As')
ax1.set_xlabel('Words')
ax1.set_ylabel('Frequency')
ax1.set_xticklabels(ax1.get_xticklabels(), rotation=45)
# Adjusting the layout
plt.tight_layout()
# Showing the plots
plt.show()
# Creating a dataframe excluding 2Q24 and 3Q24 Q&As
not_Q23_FY24 = transcripts_df[(transcripts_df['date'] < '2024-07-01') | (transcripts_df['date'] >= '2024-11-01')]
# Getting unique words for the 2Q24 and 3Q24 Q&As
Q23_FY24_unique_words = set(Q23_FY24_freq_dist.keys()) - set(not_Q23_FY24['qa_text_processed'].explode())
Q23_FY24_unique_counts = {word: Q23_FY24_freq_dist[word] for word in Q23_FY24_unique_words}
Q23_FY24_unique_df = pd.DataFrame(Q23_FY24_unique_counts.items(), columns=['word', 'Q23_FY24_frequency'])
# Getting the top 10 unique words for the 2Q24 and 3Q24 Q&As
Q23_FY24_top_words = Q23_FY24_unique_df.nlargest(10, 'Q23_FY24_frequency')
# Setting up the plot
fig, ax1 = plt.subplots(figsize=(14, 6))
# Creating a bar plot for the top 10 2Q24 and 3Q24 exclusive words
sns.barplot(data=Q23_FY24_top_words, x='word', y='Q23_FY24_frequency', palette='Blues')
plt.title('Top 10 Unique Words Used in the 2Q24 and 3Q24 Q&As')
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.xticks(rotation=45)
# Adjusting the layout
plt.tight_layout()
# Showing the plots
plt.show()
Words like βspikeβ and βtroughβ could reflect the recent fluctuations in net income and EPS.
# Calculating word frequencies and total word count in 2Q24 and 3Q24 Q&As
Q23_FY24_freq = transcripts_df[(transcripts_df['date'] >= '2024-07-01') & (transcripts_df['date'] < '2024-11-01')]
Q23_FY24_word_counts = Q23_FY24_freq['qa_text_processed'].explode().value_counts()
Q23_FY24_total_words = Q23_FY24_word_counts.sum() # Total word count for 2Q24 & 3Q24
# Calculating word frequencies and total word count in other quarters
not_Q23_FY24_word_counts = not_Q23_FY24['qa_text_processed'].explode().value_counts()
not_Q23_FY24_total_words = not_Q23_FY24_word_counts.sum() # Total word count for other quarters
# Creating a DataFrame comparing relative frequencies
word_comparison_df = pd.DataFrame({
'Q23_FY24_proportion': Q23_FY24_word_counts / Q23_FY24_total_words,
'other_quarters_proportion': not_Q23_FY24_word_counts / not_Q23_FY24_total_words
}).fillna(0)
# Adding a column for proportion difference
word_comparison_df['proportion_difference'] = word_comparison_df['Q23_FY24_proportion'] - word_comparison_df['other_quarters_proportion']
# Filtering for words that are relatively more frequent in 2Q24 and 3Q24
higher_in_Q23_FY24 = word_comparison_df[word_comparison_df['proportion_difference'] > 0]
# Selecting the top 10 words with the highest proportion difference
top_higher_words = higher_in_Q23_FY24.nlargest(10, 'proportion_difference').reset_index().rename(columns={'qa_text_processed': 'word'})
# Plotting the top words by proportional difference
fig, ax = plt.subplots(figsize=(14, 6))
sns.barplot(data=top_higher_words, x='word', y='proportion_difference', palette='Purples')
plt.title('Top 10 Words with Higher Proportion in 2Q24 and 3Q24 Compared to Other Quarters')
plt.xlabel('Words')
plt.ylabel('Proportional Difference')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
A higher usage of "capital" could relate to how the bank has maintained a high CET1 capital ratio, indicating a strong capital position.
A higher usage of "nii" (net interest income) could relate to the recent fluctuations in net income and EPS.
2 Selecting Models and Scalability AssessmentΒΆ
2.0 Creation of evaluation datasetΒΆ
To evaluate model performance and facilitate in the decision of choosing the best model for further analysis, we create two datasets:
Synthetic Data: Text data with predefined sentiment, topic, and evasion labels was generated by GPT-4 and extensively refined manually to ensure accuracy. This dataset allowed us to test the models against controlled baselines and evaluate their precision in tasks with known outcomes. This file is created manually and uploaded in the clean data folder.
Ground Truth Data: Two transcriptsβ worth of texts were randomly sampled and manually annotated for sentiment, topic, and question evasion status. This dataset provided a benchmark for assessing the models' real-world performance. The code selected random rows to form a ground truth dataset, saved as an .xlsx file. We then manually labelled them.
2.0.1 Selecting ground truth dataΒΆ
def create_ground_truth(folder_path, output_folder_path):
# define the bank of interest
banks = ['JPMorgan']
print("Which bank's data do you wish to label?")
bank = input(banks)
label_types = ['sentiment', 'topic', 'Q_outcome']
# load ground truth data (if it exists), else create the file
print("Creating a new ground truth dataset...")
transcript_csv_path = os.path.join(folder_path, f"/transcripts_tabular_{bank}_clean.csv")
transcripts_all = pd.read_csv(transcript_csv_path)
transcripts_qa = transcripts_all[transcripts_all.section=='questions_answers'].reset_index()
# consider only the qa_num where there are both Q and A (i.e. at least two rows per qa_num)
qa_nums_all = transcripts_qa.qa_num.value_counts().loc[lambda x: x > 1].index.unique()
median_num_Q = int(transcripts_all.qa_num_within.median())
print(f"Median number of Q&As per transcript: {median_num_Q}")
# how many transcripts should the Qs represent?
NUM_TRANSCRIPTS = 2
SEED = 42
# sample unique Q&As
random.seed(SEED)
qa_num_sample = random.sample(list(qa_nums_all), int(NUM_TRANSCRIPTS*median_num_Q))
ground_truth = transcripts_all[transcripts_all.qa_num.isin(qa_num_sample)]
ground_truth = ground_truth[['uid', 'qa_type', 'qa_num', 'qa_text']]
for col in label_types:
ground_truth[f"true_{col}"] = np.nan
ground_truth_path = os.path.join(output_folder_path, f"/ground_truth_{bank}_manual.xlsx")
ground_truth.to_excel(ground_truth_path, index=False)
create_ground_truth(processed_data_folder, output_data_folder)
The file is saved as an Excel file in the output_data. We then manually labelled the FinBERT topic, sentiment and evasion for each text on Google Sheet, where ChatGPT faciliated in the decision making. This altered (human-annotated) ground truth file was then saved in the processed_data folder and used for all subsequent ground truth analyses
2.1 Phi 3.5 for summarisationΒΆ
SECTION TAKES UP TO 32GB RAM TO RUN - A100 GPU
2.1.0 Phi 3.5 initialisationΒΆ
# Initialise the pipeline - note T4 GPU does not contain enough RAM, so use CPU and ignore warning OR run on A100 GPU
pipe = pipeline("text-generation", model="microsoft/Phi-3.5-mini-instruct", trust_remote_code=True, device=0)
2.1.1 Summarisation function using Phi 3.5ΒΆ
The prompt we are using is
Summarise the following text in a consistent, concise format. Limit to 1-2 sentences focusing only on main financial themes,metrics, and indicators relevant to financial performance. Avoid interpretations, bullet points, and variable formatting.Do not allow style drift in the answers. \n\nText:\n{text}\n\nSummary:
where {text} is the text from each question or answer.
# Summarisation function
def phi_summarise(input_df, input_col, batch_size=8):
"""
Function to Summarise text when given a pre-processed Q&A table as input
- Focusses on financial themes
- Summarises in 1-2 sentences
"""
start_time = time.time()
total_count = len(input_df)
x = 0
# Initialise a new column for summarised text
input_df['summarised_text'] = ""
# Process the DataFrame in batches
for start in range(0, total_count, batch_size):
end = min(start + batch_size, total_count)
batch_texts = input_df[input_col][start:end].tolist()
# Define prompt for each text in the batch
prompt = [
(
f"Summarise the following text in a consistent, concise format. Limit to 1-2 sentences focusing only on main financial themes, "
f"metrics, and indicators relevant to financial performance. Avoid interpretations, bullet points, and variable formatting. "
f"Do not allow style drift in the answers. \n\nText:\n{text}\n\nSummary:"
)
for text in batch_texts
]
print(f"Processing batch {start // batch_size + 1}/{(total_count + batch_size - 1) // batch_size}")
# Run the model on the batch prompt
batch_summaries = pipe(prompt, max_new_tokens=75, do_sample=False)
# Extract summaries and clean up the text
for i, summary in enumerate(batch_summaries):
generated_text = summary[0]['generated_text'].replace(prompt[i], "").replace("Text:\n", "").replace("Summary:", "").strip()
first_line = generated_text.split('\n', 1)[0].strip()
cleaned_text = re.sub(r"#[-β’]\s*", "", first_line).strip()
# Update Input DataFrame with the cleaned summary
input_df.at[start + i, 'summarised_text'] = cleaned_text
x += 1
# Calculate time for full dataset
end_time = time.time()
time_taken = end_time - start_time
print(f"Time taken for {total_count} rows: {round(time_taken/60, 2)} minutes")
print(f"Estimate for all transcripts (927): {round((time_taken/total_count * 927)/60/60, 2)} hours")
return input_df
2.1.2 Generate summarised text for Q&A tablesΒΆ
# Load ground truth files
ground_truth_df = pd.read_excel(processed_data_folder + "/ground_truth_JPMorgan_manual.xlsx")
# Load full Q&A table
qa_df = pd.read_csv(processed_data_folder + "/transcripts_tabular_JPMorgan_clean.csv")
# Run summarisation function on Ground Truth Q&A dataset (not aggregated)
summarised_df_gt = phi_summarise(ground_truth_df, 'qa_text', 8)
display(summarised_df_gt.head())
Saving ground_truth_JPMorgan_manual.xlsx to ground_truth_JPMorgan_manual (5).xlsx
| uid | qa_type | qa_num | qa_text | true_sentiment | true_topic | true_Q_outcome | |
|---|---|---|---|---|---|---|---|
| 0 | JPMorganChase_3Q24_Q_28.0 | Q | 28 | And so Daniel's comments in September were on ... | neutral | Earnings | NaN |
| 1 | JPMorganChase_3Q24_A_28.0 | A | 28 | No. Those were core NII or NII ex.. So again, ... | negative | Earnings | NE |
| 2 | JPMorganChase_2Q24_Q_26.0 | Q | 62 | Very good. And as a follow-up, you've been ver... | neutral | Financials | NaN |
| 3 | JPMorganChase_2Q24_A_26.0 | A | 62 | Yeah. It's a good question. I think the short ... | positive | Financials | NE |
| 4 | JPMorganChase_1Q24_Q_30.0 | Q | 97 | Thank you. And I guess, as a tie-in to that qu... | neutral | M&A | Investments | NaN |
Processing batch 1/14 Processing batch 2/14 Processing batch 3/14 Processing batch 4/14 Processing batch 5/14 Processing batch 6/14 Processing batch 7/14 Processing batch 8/14 Processing batch 9/14 Processing batch 10/14 Processing batch 11/14 Processing batch 12/14 Processing batch 13/14 Processing batch 14/14 Time taken for 108 rows: 6.94 minutes Estimate for all transcripts (927): 0.99 hours
| uid | qa_type | qa_num | qa_text | true_sentiment | true_topic | true_Q_outcome | Summarised text | summarised_text | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | JPMorganChase_3Q24_Q_28.0 | Q | 28 | And so Daniel's comments in September were on ... | neutral | Earnings | NaN | Daniel seeks clarification on the breakdown of... | |
| 1 | JPMorganChase_3Q24_A_28.0 | A | 28 | No. Those were core NII or NII ex.. So again, ... | negative | Earnings | NE | The current consensus for NII ex. is $87 billi... | |
| 2 | JPMorganChase_2Q24_Q_26.0 | Q | 62 | Very good. And as a follow-up, you've been ver... | neutral | Financials | NaN | The discussion highlights the strong performan... | |
| 3 | JPMorganChase_2Q24_A_26.0 | A | 62 | Yeah. It's a good question. I think the short ... | positive | Financials | NE | The C&I charge-off rate, historically very low... | |
| 4 | JPMorganChase_1Q24_Q_30.0 | Q | 97 | Thank you. And I guess, as a tie-in to that qu... | neutral | M&A | Investments | NaN | The speaker discusses the growth of private cr... |
# Save summarised ground truth file
summarised_df_gt.to_excel(processed_data_folder + "/phi_ground_truth_summarised.xlsx", index=False)
# Run summarisation function on full Q&A tabular dataset (note that 'N' values have been removed, so only Q and A remain)
summarised_df_full = phi_summarise(qa_df, 'qa_text', 8)
# View results
display(summarised_df_full.head())
Saving transcripts_tabular_JPMorgan_clean.xlsx to transcripts_tabular_JPMorgan_clean (2).xlsx
| uid | bank | year | quarter | date | section | name | title | firm | qa_type | qa_num_within | qa_num | qa_text | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | JPMorganChase_3Q24_N_0.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | management_discussion | Jeremy Barnum | Chief Financial Officer | JPMorgan Chase & Co. | N | NaN | NaN | Thank you and good morning, everyone. Starting... |
| 1 | JPMorganChase_3Q24_Q_1.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | questions_answers | Jim Mitchell | Analyst | Seaport Global Securities LLC | Q | 1.0 | 1.0 | Hey, good morning. So, Jeremy, as you highligh... |
| 2 | JPMorganChase_3Q24_A_1.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | questions_answers | Jeremy Barnum | Chief Financial Officer | JPMorgan Chase & Co. | A | 1.0 | 1.0 | Yeah. Sure, Jim. I'll try to answer both quest... |
| 3 | JPMorganChase_3Q24_Q_3.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | questions_answers | Steven Chubak | Analyst | Wolfe Research LLC | Q | 3.0 | 3.0 | Hi. Good morning. So Jeremy, how are you? So I... |
| 4 | JPMorganChase_3Q24_A_3.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | questions_answers | Jeremy Barnum | Chief Financial Officer | JPMorgan Chase & Co. | A | 3.0 | 3.0 | Sure. So good question and I agree with your n... |
Processing batch 1/116 Processing batch 2/116 Processing batch 3/116 Processing batch 4/116 Processing batch 5/116 Processing batch 6/116 Processing batch 7/116 Processing batch 8/116 Processing batch 9/116 Processing batch 10/116 Processing batch 11/116 Processing batch 12/116 Processing batch 13/116 Processing batch 14/116 Processing batch 15/116 Processing batch 16/116 Processing batch 17/116 Processing batch 18/116 Processing batch 19/116 Processing batch 20/116 Processing batch 21/116 Processing batch 22/116 Processing batch 23/116 Processing batch 24/116 Processing batch 25/116 Processing batch 26/116 Processing batch 27/116 Processing batch 28/116 Processing batch 29/116 Processing batch 30/116 Processing batch 31/116 Processing batch 32/116 Processing batch 33/116 Processing batch 34/116 Processing batch 35/116 Processing batch 36/116 Processing batch 37/116 Processing batch 38/116 Processing batch 39/116 Processing batch 40/116 Processing batch 41/116 Processing batch 42/116 Processing batch 43/116 Processing batch 44/116 Processing batch 45/116 Processing batch 46/116 Processing batch 47/116 Processing batch 48/116 Processing batch 49/116 Processing batch 50/116 Processing batch 51/116 Processing batch 52/116 Processing batch 53/116 Processing batch 54/116 Processing batch 55/116 Processing batch 56/116 Processing batch 57/116 Processing batch 58/116 Processing batch 59/116 Processing batch 60/116 Processing batch 61/116 Processing batch 62/116 Processing batch 63/116 Processing batch 64/116 Processing batch 65/116 Processing batch 66/116 Processing batch 67/116 Processing batch 68/116 Processing batch 69/116 Processing batch 70/116 Processing batch 71/116 Processing batch 72/116 Processing batch 73/116 Processing batch 74/116 Processing batch 75/116 Processing batch 76/116 Processing batch 77/116 Processing batch 78/116 Processing batch 79/116 Processing batch 80/116 Processing batch 81/116 Processing batch 82/116 Processing batch 83/116 Processing batch 84/116 Processing batch 85/116 Processing batch 86/116 Processing batch 87/116 Processing batch 88/116 Processing batch 89/116 Processing batch 90/116 Processing batch 91/116 Processing batch 92/116 Processing batch 93/116 Processing batch 94/116 Processing batch 95/116 Processing batch 96/116 Processing batch 97/116 Processing batch 98/116 Processing batch 99/116 Processing batch 100/116 Processing batch 101/116 Processing batch 102/116 Processing batch 103/116 Processing batch 104/116 Processing batch 105/116 Processing batch 106/116 Processing batch 107/116 Processing batch 108/116 Processing batch 109/116 Processing batch 110/116 Processing batch 111/116 Processing batch 112/116 Processing batch 113/116 Processing batch 114/116 Processing batch 115/116 Processing batch 116/116 Time taken for 926 rows: 58.79 minutes Estimate for all transcripts (927): 0.98 hours
| uid | bank | year | quarter | date | section | name | title | firm | qa_type | qa_num_within | qa_num | qa_text | Summarised text | summarised_text | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | JPMorganChase_3Q24_N_0.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | management_discussion | Jeremy Barnum | Chief Financial Officer | JPMorgan Chase & Co. | N | NaN | NaN | Thank you and good morning, everyone. Starting... | The Firm reported a net income of $12.9 billio... | |
| 1 | JPMorganChase_3Q24_Q_1.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | questions_answers | Jim Mitchell | Analyst | Seaport Global Securities LLC | Q | 1.0 | 1.0 | Hey, good morning. So, Jeremy, as you highligh... | The text discusses concerns about a significan... | |
| 2 | JPMorganChase_3Q24_A_1.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | questions_answers | Jeremy Barnum | Chief Financial Officer | JPMorgan Chase & Co. | A | 1.0 | 1.0 | Yeah. Sure, Jim. I'll try to answer both quest... | The yield curve is the primary factor driving ... | |
| 3 | JPMorganChase_3Q24_Q_3.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | questions_answers | Steven Chubak | Analyst | Wolfe Research LLC | Q | 3.0 | 3.0 | Hi. Good morning. So Jeremy, how are you? So I... | The consensus expense forecast for the next ye... | |
| 4 | JPMorganChase_3Q24_A_3.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | questions_answers | Jeremy Barnum | Chief Financial Officer | JPMorgan Chase & Co. | A | 3.0 | 3.0 | Sure. So good question and I agree with your n... | The company's financial performance is influen... |
# Save summarised file
summarised_df_gt.to_excel(processed_data_folder + "/phi_fulltable_summarised.xlsx", index=False)
These summarised tables (ground truth and full) will be passed as input to finBERT and compared to analysis on non-summarised table data.
2.2 Sentiment AnalysisΒΆ
We compare two sentiment classification models, "yiyanghkust/finbert-tone" and "soleimanian/financial-roberta-large-sentiment", both from huggingface. Both returns either 'positive', 'negative' or 'neutral' where the first model capitalises the first letter.
Finbert-tone is a fine-tuned version of FinBert for sentiment classification on 10,000 manually annotated sentences from analyst reports. On the other hand, the Roberta model was trained on a large corpus including CSR reports, ESG news and eaernings call transcripts.
We are using the ground truth dataset to evluating which model is better.
2.2.0 Load the ground truth datasetΒΆ
A new column in the evaluation dataset is created where the labels are converted to integer. (positive to 1, negative to -1, neutral to 0)
score_dict = {'neutral':0, 'positive':1, 'negative':-1}
eval_data = pd.read_excel(processed_data_folder + "/ground_truth_JPMorgan_manual.xlsx")
eval_data.drop(['qa_type', 'qa_num', 'true_topic', 'true_Q_outcome'], inplace=True, axis=1)
eval_data['true_score'] = eval_data['true_sentiment'].map(score_dict)
2.2.1 Class for running sentiment classification modelΒΆ
There are multiple ways to use the sentiment analysis models. The most striaght forward way is getting the label for each of the text, and compare the accuracy, precision, recall and f1_score of the two models on the ground truth. This is stored in a column with suffix '_sentiment'.
Alternatively, we can use the probability outputs of the label to compute a numeric score between -1 and 1. For each text, the formula would simply be $$ \text{score} = \text{probability of positive} - \text{probability of negative.}$$ This is stored in a column with suffix '_score'. Then by converting the truth label to -1, 0 and 1, we can compare the MSE of the two models.
Another way of quantifying the output is breaking each text into sentences and feed each sentence to the model. Then a score can be computed from the number of positive and negative sentences. The formula is $$ \text{score} = \frac{\text{number of positive sentences} - \text{number of negative sentences}}{\text{number of positive sentences} + \text{number of negative sentences}}.$$ This is stored in a column with suffix '_average_sentence_label_score'. We can then compare the the MSE of the two models.
## Helper functions
def get_quantiles(num_list, num_chunks):
"""
Calculate approximate quantiles from a list of numbers, dividing the list into equal chunks
Args:
num_list (list of int or float): A list of numeric values to be split into approximately equal chunks, pre-sorted.
num_chunks (int): Number of chunks the list to be split into.
Returns:
list of float or int: A list of values from 'num_list' that correspond to approximate quantiles.
"""
quantiles = [(i + 1) / num_chunks for i in range(num_chunks - 1)]
selected_positions = []
for q in quantiles:
if num_list:
pos_index = int(len(num_list) * q) - 1
if pos_index >= 0:
selected_positions.append(num_list[min(pos_index, len(num_list) - 1)])
return selected_positions
def chunk_text(text, num_chunks):
"""
Split a given text into approximately equal chunks based on positions of periods.
Args:
text (str): The text to be split into chunks.
num_chunks (int): The number of chunks to create.
Returns:
list of str: A list of text chunks.
"""
positions = [index for index, char in enumerate(text) if char == '.']
if positions:
split_positions = get_quantiles(positions, num_chunks)
last_position = 0
chunks = []
for pos in split_positions:
chunks.append(text[last_position:pos + 1].strip())
last_position = pos + 1
chunks.append(text[last_position:].strip())
return chunks
def compute_score_from_prob(response):
"""
Compute a sentiment score based on probabilities for positive and negative labels of a response.
The formula is probability of positive - probability of negative.
Args:
response (list of dict): A list of dictionaries where each dictionary contains a `label` and a `score`. It would work on any classification response from huggingface.
Returns:
float: The computed sentiment score.
"""
positive_score = next(item['score'] for item in response if item['label'].lower() == 'positive')
negative_score = next(item['score'] for item in response if item['label'].lower() == 'negative')
return positive_score - negative_score
def split_string_by_punctuation(text):
"""
Split a given text into sentences based on specific punctuation marks, while avoiding splitting on periods in abbreviations like i.e. and U.S..
Args:
text (str): The text to be split into sentences.
Returns:
list of str: A list of sentences.
"""
punct_regex = r"(?=\S)(?:i.e.|J.P.|U.S.|ex.|[A-Z][a-z]{0,3}\.|[^.?!]|\.(?!\s+[A-Z]))*.?"
return re.findall(punct_regex, text)
def compute_label_from_score(score):
if score > 0.2:
return 'positive'
elif score < -0.2:
return 'negative'
else:
return 'neutral'
## Class
class Model:
"""
Base class for using huggingface model
Args:
name (str): Custom name of the model to be used in dataframe heading.
huggingface_model (str): The name of the model on huggingface hub, has a structure of {creator}/{model_name}
task (str): The machine learning task the model performs (should be one of many from huggingface)
"""
def __init__(self, name, huggingface_model, task):
self.name = name
self.huggingface_model = huggingface_model
self.task = task
self.model = pipeline(self.task, model=self.huggingface_model)
self.tokenizer = AutoTokenizer.from_pretrained(self.huggingface_model)
self.token_max_length = self.model.model.config.max_position_embeddings
def get_model_response_for_df(self, data, column):
pass
class ClassificationModel(Model):
def __init__(self, name, huggingface_model):
super().__init__(name, huggingface_model, "text-classification")
def get_model_response_for_df(self, data, column):
data[self.name + '_sentiment'], data[self.name + '_score'] = zip(
*data.apply(lambda x: self.get_model_response(x, column), axis=1))
return data
def get_model_response(self, row, column):
input = row[column]
token_length = len(self.tokenizer(input)['input_ids'])
if token_length <= self.token_max_length:
response = self.model(input, top_k=3)
label = response[0]['label'].lower()
score = compute_score_from_prob(response)
return label, score
else:
chunk_number = token_length // self.token_max_length + 1
chunks = chunk_text(input, chunk_number)
for chunk in chunks:
if len(self.tokenizer(chunk)['input_ids']) > self.token_max_length:
chunk_number = len(self.tokenizer(chunk)['input_ids']) // self.token_max_length + 1
chunks.extend(chunk_text(chunk, chunk_number))
chunks.remove(chunk)
responses = self.model(chunks, top_k=3)
scores = [compute_score_from_prob(response) for response in responses]
score = sum(scores) / len(scores)
label = compute_label_from_score(score)
return label, score
def get_model_response_from_sentences_df(self, data, column):
data[self.name + '_average_sentence_label_score'] = data.apply(
lambda x: self.get_model_response_from_sentences(x, column), axis=1)
return data
def get_model_response_from_sentences(self, row, column):
input = row[column]
sentences = split_string_by_punctuation(input)
num_pos = 0
num_neg = 0
for sentence in sentences:
response = self.model(sentence)[0]['label'].lower()
if response == 'positive':
num_pos += 1
if response == 'negative':
num_neg += 1
if num_pos == 0 and num_neg == 0:
return 0
else:
return (num_pos - num_neg) / (num_pos + num_neg)
2.2.2 Comparing the two modelsΒΆ
def get_model_accuracy(model_name, data):
true_response = data['true_sentiment'].to_list()
model_response = data[model_name+'_sentiment'].to_list()
cf = ConfusionMatrixDisplay(confusion_matrix(true_response, model_response), display_labels=['negative', 'neutral', 'positive'])
cf.plot()
colorbar = cf.ax_.images[0].colorbar
colorbar.set_label('Frequency')
plt.title(f"Confusion matrix for {model_name}")
plt.show()
print(classification_report(true_response, model_response))
# Load the two models and run them on the ground truth data set
finbert_model = ClassificationModel("finbert-tone", "yiyanghkust/finbert-tone")
eval_data = finbert_model.get_model_response_for_df(eval_data, 'qa_text')
roberta_model = ClassificationModel("financial-roberta-large", "soleimanian/financial-roberta-large-sentiment")
eval_data = roberta_model.get_model_response_for_df(eval_data, 'qa_text')
# save the results of ground truth data set in the output folder
eval_data.to_csv(output_data_folder + "/sentiment_eval_result_ground_truth.csv", index=False)
from IPython.display import display
from ipywidgets import Output, HBox
out1 = Output()
out2 = Output()
with out1:
get_model_accuracy("finbert-tone", eval_data)
with out2:
get_model_accuracy("financial-roberta-large", eval_data)
display(HBox([out1, out2]))
From the result, we can see that the Roberta model has a slightly higher accuracy of 67% compared to FinBert's 65%. Although the model has similar f1-score for neutral, FinBert's precision and recall are unbalanced, suggesting that the FinBert model is quite conservative and predict most labels as 'neutral'. This is also reflected in the recall for 'positive' and 'negative'.
For our analysis, we are more interested in the positive and negative classifications than the neutral ones, so it is better to choose the Roberta model.
2.2.3 Running the model on the full data setΒΆ
# load the full dataset
full_data = pd.read_csv(processed_data_folder + "transcripts_tabular_JPMorgan_clean.csv")
full_data = full_data[full_data['qa_type'].isin(['Q', 'A'])]
# running the model
roberta_model = ClassificationModel("financial-roberta-large", "soleimanian/financial-roberta-large-sentiment")
full_data = roberta_model.get_model_response_for_df(full_data, 'qa_text')
# save the response
full_data.to_csv(output_data_folder + "/sentiment_full_result.csv", index=False)
2.3 Topic modellingΒΆ
2.3.1 FinBERT classificationΒΆ
2.3.1.0 Setting up the modelΒΆ
# connecting to huggingface
huggingface_token = userdata.get("huggingface_token")
!huggingface-cli login --token $huggingface_token
# loading the tokeniser and the model
finbert_topic_tokeniser = AutoTokenizer.from_pretrained("nickmuchi/finbert-tone-finetuned-finance-topic-classification")
finbert_topic_model = AutoModelForSequenceClassification.from_pretrained("nickmuchi/finbert-tone-finetuned-finance-topic-classification")
# define a label dictionary to help us interpret the labels predicted by the model
# (obtained from HuggingFace)
id2label= {
0: "Analyst Update",
1: "Fed | Central Banks",
2: "Company | Product News",
3: "Treasuries | Corporate Debt",
4: "Dividend",
5: "Earnings",
6: "Energy | Oil",
7: "Financials",
8: "Currencies",
9: "General News | Opinion",
10: "Gold | Metals | Materials",
11: "IPO",
12: "Legal | Regulation",
13: "M&A | Investments",
14: "Macro",
15: "Markets",
16: "Politics",
17: "Personnel Change",
18: "Stock Commentary",
19: "Stock Movement"
}
# Check token lengths
def plot_finbert_token_lengths(data_folder, finbert_tokeniser, max_length=512,
qa_only=True, summarised=False, appdx=""):
"""
given the path to the data folder and the FinBERT tokeniser,
this function returns an array of token lengths
and plots their distribution
max_length (int) : max token length - will be plotted!
qa_only (bool) : keeps only non-nan qa_num if true
"""
banks = ['JPMorgan']
print("Which bank's data do you wish to label?")
bank = input(banks)
# load the ddata
if summarised:
summarised_text_path = os.path.join(data_folder, f"phi_fulltable{appdx}.xlsx")
df = pd.read_excel(summarised_text_path)
df = df.drop(['qa_text'], axis=1)
df.rename(columns={'summarised_text': 'qa_text'}, inplace=True)
else:
transcript_csv_path = os.path.join(data_folder, f"transcripts_tabular_{bank}_clean.csv")
df = pd.read_csv(transcript_csv_path)
if qa_only:
df = df[~df.qa_num.isna()].reset_index(drop=True)
arr = df.qa_text
token_lengths = []
# estimate token length without truncation
for txt in arr:
inputs_nontruncated = finbert_tokeniser(
txt,
return_tensors="pt",
truncation=False,
padding=False
)
token_lengths.append(len(inputs_nontruncated[0]))
fig, ax = plt.subplots(1,1, figsize=(3,3))
sns.histplot(token_lengths, ax =ax)
ax.vlines(max_length, 0, 140, ls='dashed', color='black')
ax.set_xlabel("Token length")
ax.set_ylabel("Number of texts")
num_too_long = (np.array(token_lengths)>max_length).sum()
print(f"Max token length is exceeded by {num_too_long} ({num_too_long*100/arr.shape[0]:.0f}%) entries")
plot_finbert_token_lengths(processed_data_folder, finbert_topic_tokeniser, max_length=512)
Which bank's data do you wish to label? ['JPMorgan']JPMorgan Max token length is exceeded by 23 (3%) entries
FinBERT can only handle inputs of up to 512 tokens in length. If our texts are substantially longer, truncation could diminish the accuracy of the model. Therefore, we will begin by performing just the tokenisation step to check how long the data is.
Evidently, only 3% of entries exceed token length, so this is not a major problem. Still, we will include chunking at least as an option in our FinBERT topic assignment function. Depending on classification accuracy, it might even be of interest to consider shorter than max allowed token max lengths.
2.3.1.1 Helper functionsΒΆ
2.3.1.1.1 Function for label predictionΒΆ
We will define a function get_finbert_topics() that will take a string input (one question or answer) and generate:
- predicted label ID
- the corresponding label
- logits for all possible topics
If chunking=True, texts longer than max_length will be split into equal-sized chunks and a label will be predicted for each chunk. If the chunks have different labels, all will be returned.
# function for label prediction
def get_finbert_topics(txt, finbert_tokeniser, finbert_model, label_dict,
chunking=False, max_length=512):
"""
given an input string, tokeniser, model, and a dictionary of labels,
this function returns the predicted label id, predicted label, and the full
array of logits for that input string
chunking (bool): if True, texts longer than 512 tokens are split into the
minimum number of ~equal-size chunks possible for each chunk to be
smaller than 512 tokens. A label is then predicted for each chunk.
If the labels do not match, the label gets assigned based on the max
logit across chunks.
"""
MAX_LENGTH = max_length
if chunking:
# estimate token length without truncation
inputs_nontruncated = finbert_tokeniser(
txt,
return_tensors="pt",
truncation=False,
padding=False
)
token_length = len(inputs_nontruncated[0])
if token_length>MAX_LENGTH:
print(f"Token exceeds max length: {txt[:100]}")
num_chunks = (token_length//MAX_LENGTH)+1
txt_split = txt.split(" ")
chunk_length_str=int(np.ceil(len(txt_split)/num_chunks))
if len(txt_split)%num_chunks==1: # odd len(txt_split)
chunk_length_str += 1
txt_list = [" ".join(txt_split[i:i + chunk_length_str]) for i in range(0, len(txt_split), chunk_length_str)]
else:
txt_list = [txt]
else:
txt_list = [txt]
predicted_label_ids = []
predicted_logits = []
for i, txt_chunk in enumerate(txt_list):
inputs = finbert_tokeniser(
txt_chunk,
return_tensors="pt",
truncation=True,
padding=True,
max_length=MAX_LENGTH
)
with torch.no_grad():
outputs = finbert_model(**inputs)
logits = outputs.logits
# get the predicted label
predicted_label_ids.append(torch.argmax(logits, dim=1).item())
predicted_logits.append(logits.detach().cpu().numpy().flatten())
# check if predicted labels are the same for all chunks
# if so, assign a single label to the input row
if len(set(predicted_label_ids))==1:
predicted_label = label_dict[predicted_label_ids[0]]
return predicted_label_ids[0], predicted_label, predicted_logits[0], None
else:
# return all predictions, as well as the text chunks
predicted_labels = [label_dict[predicted_label_ids[i]] for i in range(len(predicted_label_ids))]
return predicted_label_ids, predicted_labels, predicted_logits, txt_list
2.3.1.1.2 Function for label storageΒΆ
Next, we will define a function process_finbert that:
- checks if a labelled dataset with the same chunking and max length already exists in the folder, else it will create a new file
- runs
get_finbert_topics()and handles multiple chunks- if multiple labels are returned, the first will replace the existing row and the others will be appended to the labelled dataset as new rows. The
uidof such new rows will be the same as that of the existing row to allow one-sided merging of this and the original dataset.
- if multiple labels are returned, the first will replace the existing row and the others will be appended to the labelled dataset as new rows. The
- allows for labelling of only the Q&A sections (
qa_only=True) or the management discussion section as well (qa_only=False)
# Function for label storage
def process_finbert(finbert_tokeniser, finbert_model, save_folder, label_dict,
data_folder=None, chunking=False, max_length=512, qa_only=False,
summarised=False, synthetic=False, appdx=""):
# define the bank of interest
banks = ['JPMorgan']
print("Which bank's data do you wish to label?")
bank = input(banks)
# define the columns of interest
cols_to_add = ['finbert_topic_id', 'finbert_topic_label']
[cols_to_add.append(f"topic_{i}_logit") for i in label_dict.keys()]
# load (partly) labelled dataset (if it exists), else create the file
finbert_topics_path = os.path.join(save_folder, f"finbert_topics_{bank}_chunking{chunking}_maxlength{max_length}{appdx}.csv")
if os.path.exists(finbert_topics_path):
print("Loading (partly) labelled data...")
finbert_df = pd.read_csv(finbert_topics_path)
transcripts_qa = finbert_df.copy()
else:
print("Creating a new dataset for labelled data...")
if data_folder is None:
raise ValueError("Data folder not supplied!")
if summarised:
summarised_text_path = os.path.join(data_folder, f"phi_fulltable{appdx}.xlsx")
transcripts_all = pd.read_excel(summarised_text_path)
transcripts_all = transcripts_all.drop(['qa_text'], axis=1)
transcripts_all.rename(columns={'summarised_text': 'qa_text'}, inplace=True)
elif synthetic:
synthetic_text_path = os.path.join(data_folder, f"synthetic_data_for_finbert.csv")
transcripts_all = pd.read_csv(synthetic_text_path)
else:
transcript_csv_path = os.path.join(data_folder, f"transcripts_tabular_{bank}_clean.csv")
transcripts_all = pd.read_csv(transcript_csv_path)
# subset only Q&A
if qa_only:
transcripts_qa = transcripts_all[~transcripts_all['qa_num'].isna()].copy().reset_index(drop=True)
else:
transcripts_qa = transcripts_all.copy()
# keep only the strictly necessary subset of columns
cols_to_keep = ['uid', 'qa_text']
finbert_df = transcripts_qa[cols_to_keep].copy()
# add cols for topic labels
for col in cols_to_add:
finbert_df.loc[:,col] = np.nan
# save file
finbert_df.to_csv(finbert_topics_path, index=False)
# iterate over texts and save labels in each loop
for i, row in transcripts_qa.iterrows():
print(f"Processing text {i}/{transcripts_qa.shape[0]}...")
predicted_label_id, predicted_label, logits, chunks = get_finbert_topics(
txt=row['qa_text'],
finbert_tokeniser=finbert_tokeniser,
finbert_model=finbert_model,
label_dict=label_dict,
chunking=chunking,
max_length=max_length
)
if chunks is not None:
print(f"Multiple labels detected!")
print(f"-------label: {predicted_label[0]}")
finbert_df.loc[finbert_df['uid']==row['uid'], "qa_text"] = chunks[0]
finbert_df.loc[finbert_df['uid']==row['uid'], cols_to_add[0]] = predicted_label_id[0]
finbert_df.loc[finbert_df['uid']==row['uid'], cols_to_add[1]] = predicted_label[0]
finbert_df.loc[finbert_df['uid']==row['uid'], cols_to_add[2:]] = logits[0]
for k in range(1, len(chunks)):
print(f"-------label: {predicted_label[k]}")
row_to_add = pd.DataFrame({
"uid": [row['uid']],
"qa_text": [chunks[k]],
cols_to_add[0]: [predicted_label_id[k]],
cols_to_add[1]: [predicted_label[k]],
})
row_to_add[cols_to_add[2:]] = logits[k]
finbert_df = pd.concat((finbert_df, row_to_add), ignore_index=True)
else:
print(f"-------label: {predicted_label}")
finbert_df.loc[finbert_df.uid==row.uid, cols_to_add[0]] = predicted_label_id
finbert_df.loc[finbert_df.uid==row.uid, cols_to_add[1]] = predicted_label
finbert_df.loc[finbert_df.uid==row.uid, cols_to_add[2:]] = logits
# update the file
finbert_df.to_csv(finbert_topics_path, index=False)
2.3.1.1.3 Function for model evaluationΒΆ
We have a "ground truth" dataset where topics from the same topic list have been manually assigned to two transcripts' worth of randomly sampled question-answer pairs. This dataset can be used to evaluate model performance.
This is a multi-class problem and the datasets can be expected to be quite imbalanced. Therefore, we will examine precision and recall by printing a classification report. We will also generate two confusion matrices - one normalised by the column (predicted label class) and the other normalised by the row (true label). This will help us see how misclassifications arise.
def evaluate_model_performance(data_folder, finbert_folder, label_dict,
chunking=False, max_length=512, appdx="",
save_ground_truth=False):
# define the bank of interest
banks = ['JPMorgan']
print("Which bank's data do you wish to label?")
bank = input(banks)
# load ground truth data
path_ground_truth = os.path.join(data_folder, f"ground_truth_{bank}_manual.xlsx")
df_ground_truth = pd.read_excel(path_ground_truth)
# load model labels
path_model = os.path.join(finbert_folder, f"finbert_topics_{bank}_chunking{chunking}_maxlength{max_length}{appdx}.csv")
df_finbert = pd.read_csv(path_model)
# merge the datasets on uid (multiple labels per uid possible)
df_merged = df_finbert.merge(df_ground_truth, on=['uid'], how='left', suffixes=['_model', '_true'])
# exclude rows with no ground truth
df_merged = df_merged.dropna(axis=0, subset=['true_topic'])
if save_ground_truth:
logit_cols = [col for col in df_merged.columns if 'logit' in col]
df_merged_short = df_merged.drop(['qa_text_model','qa_text_true',
'true_sentiment', 'true_Q_outcome',
'qa_num', 'finbert_topic_id', 'qa_type'] + logit_cols,
axis=1)
df_merged_short.to_csv(os.path.join(finbert_folder, f"finbert_topics_ground_truth_{bank}_chunking{chunking}_maxlength{max_length}_QA{appdx}.csv"),
index=False)
print(f"There are {df_merged.shape[0]} texts in the evaluated dataset.")
# get the topics covered in labelled and ground truth datasets
topics = np.unique(np.concatenate((df_merged['true_topic'],df_merged['finbert_topic_label'])))
# get classification report
report = classification_report(
df_merged['true_topic'],
df_merged['finbert_topic_label'],
zero_division=0
)
print(report)
# plot confusion matrix
fig, axes = plt.subplots(1,2, figsize=(8,3), sharey=True)
for i, (norm, tlt) in enumerate(zip(['pred', 'true'], ['Normalised by predicted', 'Normalised by true'])):
conf_matrix = confusion_matrix(df_merged['true_topic'],
df_merged['finbert_topic_label'],
normalize=norm)
sns.heatmap(conf_matrix, ax=axes[i], cmap='crest')
axes[i].set_xticklabels(topics, rotation=90)
axes[i].set_xlabel("Predicted label")
axes[i].set_title(tlt)
axes[0].set_ylabel("True label")
axes[0].set_yticklabels(topics, rotation=0)
# plot confusion matrix - just one for presentation
fig2, axes2 = plt.subplots(1,1, figsize=(9,8), sharey=True)
conf_matrix = confusion_matrix(df_merged['true_topic'],
df_merged['finbert_topic_label'],
normalize='true')
sns.heatmap(conf_matrix, ax=axes2, cmap='crest')
fsize=19
axes2.set_xticklabels(topics, rotation=90, fontsize=fsize)
axes2.set_xlabel("Predicted label", fontsize=fsize)
axes2.set_title("Confusion matrix", fontsize=fsize)
axes2.set_ylabel("True label", fontsize=fsize)
axes2.set_yticklabels(topics, rotation=0, fontsize=fsize)
cbar = axes2.collections[0].colorbar
cbar.ax.tick_params(labelsize=fsize)
cbar.set_label("Fraction of true label", fontsize=fsize)
plt.tight_layout()
2.3.1.1.4 Functions for plottingΒΆ
2.3.1.2 Choosing the optimal chunking methodΒΆ
get_merged_data_for_plottingwill allow the user to select the bank and the type of text (Q, A, Q+A, management discussion, or all) that is of interest. It will also merge labelled data with the full dataset, add aquarter_strcolumn for easy selection of a specific quarter, and compute probabilities from logits for each text.softmax(logits)will compute probabilities of topic occurrence based on the multi-class logits provided by the model.get_optimal_subplot_dims(n)will compute the number of rows and columns needed in a set of subplots based on the total number of subplots.get_stats_stars(pval, p_thresholds=[0.05,0.01,0.001])will express the statistical significance of a p-value through 1-3 stars based on significance thresholds (* p<0.05, ** p<0.01, *** p<0.001 as a default)
def get_merged_data_for_plotting(finbert_folder, data_folder, label_dict,
chunking=True, max_length=512, bank='JPMorgan',
datatype='all', appdx="", summarised=False,
synthetic=False, sentiment=None, sentiment_folder=None):
"""
merge and subset datasets before plotting
datatype (str) can be 'Q' (questions only), 'A' (answers only),
'QA' (Q+A with qa_num only), 'presentation' (management discussion)
or 'all' (default; to proceed without subsetting )
"""
# load model labels
path_model = os.path.join(finbert_folder, f"finbert_topics_{bank}_chunking{chunking}_maxlength{max_length}{appdx}.csv")
df_finbert = pd.read_csv(path_model)
# load cleaned data
if summarised:
summarised_text_path = os.path.join(data_folder, f"phi_fulltable{appdx}.xlsx")
transcripts_all = pd.read_excel(summarised_text_path)
transcripts_all = transcripts_all.drop(['qa_text'], axis=1)
transcripts_all.rename(columns={'summarised_text': 'qa_text'}, inplace=True)
elif synthetic:
synthetic_text_path = os.path.join(data_folder, f"synthetic_data_for_finbert.csv")
transcripts_all = pd.read_csv(synthetic_text_path)
else:
transcript_csv_path = os.path.join(data_folder, f"transcripts_tabular_{bank}_clean.csv")
transcripts_all = pd.read_csv(transcript_csv_path)
transcripts_all = transcripts_all.drop(['qa_text'], axis=1)
# merge the datasets and drop unlabelled rows
df_merged = df_finbert.merge(transcripts_all, on=['uid'], how='left')
df_merged = df_merged.dropna(axis=0, subset=['finbert_topic_label'])
# compute probabilities
logit_cols = [f"topic_{i}_logit" for i in label_dict.keys()]
prob_cols = [f"topic_{i}_prob" for i in label_dict.keys()]
for pc in prob_cols:
df_merged[pc] = np.nan
df_merged[prob_cols] = softmax(df_merged[logit_cols].to_numpy(), axis=1)
if not synthetic:
# add quarters
df_merged['quarter_str'] = [f"{q}Q{str(y)[-2:]}" for q, y in zip(df_merged['quarter'], df_merged['year'])]
# order quarters
quarter_str_order = df_merged['quarter_str'].unique()[::-1]
df_merged['quarter_str'] = pd.Categorical(df_merged['quarter_str'], categories=quarter_str_order, ordered=True)
# print("QUARTER ORDER: ", quarter_str_order)
else:
quarter_str_order = None
# subsetting data type
if datatype in ["Q", "A"]:
df_merged = df_merged[df_merged['qa_type']==datatype].copy().reset_index(drop=True)
elif datatype == "QA":
df_merged = df_merged[~df_merged['qa_num'].isna()].copy().reset_index(drop=True)
elif datatype == "presentation":
df_merged = df_merged[df_merged['section']=='management_discussion'].copy().reset_index(drop=True)
if sentiment is not None and sentiment_folder is not None:
sentiment_path = os.path.join(sentiment_folder, "full_result.csv")
sentiment_df = pd.read_csv(sentiment_path)
sentiment_colname = 'financial-roberta-large_sentiment'
sentiment_df = sentiment_df[['uid', sentiment_colname]]
df_merged_sentiment = pd.merge(df_merged, sentiment_df, on=['uid'], how='left')
if sentiment=='negative minus positive':
pass
else:
df_merged = df_merged_sentiment[df_merged_sentiment[sentiment_colname].eq(sentiment)].copy().reset_index()
return df_merged, quarter_str_order, (bank, datatype)
# derive probabilities from logits
def softmax(logits, axis=1):
exp_logits = np.exp(logits - np.nanmax(logits))
return exp_logits / exp_logits.sum(axis=axis).reshape(-1,1)
def get_optimal_subplot_dims(n):
rows = math.floor(math.sqrt(n))
cols = math.ceil(n / rows)
if rows * cols < n:
rows += 1
return rows, cols
def get_stats_stars(pval, p_thresholds=[0.05,0.01,0.001]):
p_thresholds = np.asarray(p_thresholds)
ptext = "*" * (pval<p_thresholds).sum()
if (pval<p_thresholds).sum()==0:
ptext="n.s."
return ptext
2.3.1.2.1 chunking=False, maxlength=512ΒΆ
# generate predictions
process_finbert(
finbert_tokeniser=finbert_topic_tokeniser,
finbert_model=finbert_topic_model,
save_folder=output_data_folder,
label_dict=id2label,
data_folder=output_data_folder,
chunking=False,
max_length=512,
qa_only=True
)
# evaluate model performance
evaluate_model_performance(
data_folder=processed_data_folder,
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=False,
max_length=512,
save_ground_truth=True
)
Which bank's data do you wish to label?
['JPMorgan']JPMorgan
There are 108 texts in the evaluated dataset.
precision recall f1-score support
Company | Product News 0.25 0.06 0.10 16
Earnings 0.00 0.00 0.00 24
Fed | Central Banks 0.18 0.33 0.24 6
Financials 0.00 0.00 0.00 28
General News | Opinion 0.11 0.90 0.19 10
Legal | Regulation 0.00 0.00 0.00 7
M&A | Investments 0.00 0.00 0.00 7
Macro 0.00 0.00 0.00 3
Markets 0.00 0.00 0.00 5
Politics 0.00 0.00 0.00 0
Stock Commentary 0.00 0.00 0.00 0
Treasuries | Corporate Debt 0.50 0.50 0.50 2
accuracy 0.12 108
macro avg 0.09 0.15 0.09 108
weighted avg 0.07 0.12 0.05 108
Evidently, model accuracy is above chance level (5%).
Still, correct classification of just 12% of inputs cannot be considered a good performance.
Examining finer-grained performance metrics:
- General News | Opinion has very high recall (0.90), but fairly low precision (0.11). Likewise, the indicating that the model is prone to incorrectly applying this rather uninformative label to almost all other topics (see the General News | Opinion column in the right heatmap).
- Politics and Stock Commentary do not seem affected by this misclassification only because there are no instances of these labels in the ground truth dataset.
- Texts belonging to Fed | Central Banks appear to be the least prone to being misclassified as General News | Opinion.
- Fed | Central Banks and Treasuries | Corporate Debt also have higher recall than most topics, and non-zero precision.
- All instances that the model classified as M&A | Investments, Macro, and Stock Commentary are labelled
as Earnings in the ground truth dataset. In fact, no instance of Earnings in the ground truth dataset is labelled as such by the model.
- This potentially reflects a different interpretation of the Earnings topic by the financial experts, who labelled the data FinBERT was fine-tuned on, and the human labeller who created the "ground truth" dataset used here despite having no background in finance. In the current "ground truth" dataset, texts about NII in particular tended to be classified as Earnings...
- The model also has not classified any of these texts as Legal | Regulation (7 instances in the ground truth dataset) or Markets (5 instances).
- The ground truth labeller primarily struggled to discriminate between Financials and Earnings; Legal | Regulation and Fed | Central Banks; Product | Company News and M&A | Investment. It is interesting that, rather than confusing specific pairs of classes, the model seems to have a tendency to label texts as General News | Opinion or (less often) Fed | Central Banks.
To assess model performance beyond its tendency to classify texts as "General News | Opinion", we will now define a function that ignores the probability of "General News | Opinion" and assigns labels based on the probabilities of the remaining topics.
2.3.1.2.2 Defining a function to reclassify texts ignoring "General News | Opinions"ΒΆ
Preventing any texts to be classified as "General News | Opinion" is not ideal, because it really is the most suitable topic in some cases. Based on the randomly sampled ground truth dataset, we could expect about 9% of the texts to truly reflect general news or opinion. Unfortunately, logit values do not allow us to separate this minority of cases from many others where this topic is assigned erroneously. For example,
- All right. Thanks for all that. -- would fit under "General News | Opinion" and has a logit value for this topic of 3.77
- Very good. Appreciate the color and candor, as always. Thank you. -- would also fit under "General News | Opinion", but has a logit value of 5.70
- If we adjust [the consensus expense forecast] for the one-timers this year, that would suggest a core expense base that's just below $90 billion, so pretty healthy step-up in expenses. I know you've always had a strong commitment and discipline around investment. Just want to better understand where those incremental dollars are being deployed and just which investments are being prioritized in particular looking out to next year. -- This is about investments, yet its logit value (4.26) is between the two more general texts.
Therefore, a complete removal of this topic label will be the most effective approach.
def reclassify_finbert(finbert_folder, label_dict, exclude='General News | Opinions',
chunking=False, max_length=512, appdx=""):
# define the bank of interest
banks = ['JPMorgan']
print("Which bank's data do you wish to label?")
bank = input(banks)
# load model labels
path_model = os.path.join(finbert_folder, f"finbert_topics_{bank}_chunking{chunking}_maxlength{max_length}{appdx}.csv")
path_model_new = os.path.join(finbert_folder, f"finbert_topics_{bank}_chunking{chunking}_maxlength{max_length}{appdx}_relabelled.csv")
df_finbert = pd.read_csv(path_model)
# find key to exclude and replace the respective col with nans
key_to_exclude = [key for key,val in label_dict.items() if val==exclude][0]
# isolate logit cols, except the key to exclude
logit_cols = [col for col in df_finbert.columns if "_logit" in col and f"{key_to_exclude}_logit" not in col]
df_finbert['finbert_topic_id'] = [int(i.split("_")[1]) for i in df_finbert[logit_cols].idxmax(axis=1)]
df_finbert['finbert_topic_label'] = [label_dict[i] for i in df_finbert['finbert_topic_id']]
df_finbert.to_csv(path_model_new, index=False)
# reclassify the model
reclassify_finbert(
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=False,
max_length=512,
exclude='General News | Opinion'
)
evaluate_model_performance(
data_folder=processed_data_folder,
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=False,
max_length=512,
appdx='_relabelled'
)
Which bank's data do you wish to label?
['JPMorgan']JPMorgan
Which bank's data do you wish to label?
['JPMorgan']JPMorgan
There are 108 texts in the evaluated dataset.
precision recall f1-score support
Company | Product News 0.52 1.00 0.68 16
Earnings 0.00 0.00 0.00 24
Fed | Central Banks 0.12 0.33 0.18 6
Financials 0.17 0.04 0.06 28
General News | Opinion 0.00 0.00 0.00 10
Legal | Regulation 0.00 0.00 0.00 7
M&A | Investments 0.00 0.00 0.00 7
Macro 0.04 0.33 0.07 3
Markets 0.00 0.00 0.00 5
Politics 0.00 0.00 0.00 0
Stock Commentary 0.00 0.00 0.00 0
Treasuries | Corporate Debt 0.25 0.50 0.33 2
accuracy 0.19 108
macro avg 0.09 0.18 0.11 108
weighted avg 0.13 0.19 0.13 108
If texts are relabelled based on the max topic probability, ignoring "General News | Opinion", model accuracy increases from 12% to 19%.
The confusions are also more in line with the uncertainties of the human labeller:
- "Earnings" misclassified as "Macro", "Financials", "Stock Commentary"
- "Legal | Regulation" misclassified as "Fed | Central Banks"
- "Fed | Central Banks" misclassified as "Politics"
- "M&A | Investments" misclassified as "Company | Product News" (also as "Stock Commentary")
2.3.1.2.3 Defining a function to plot model logits for ground truth labelsΒΆ
A better way to visualise the behaviour of the model would be by looking beyond the assigned labels and plotting the logit density distributions of various ground truth labels directly.
The following function will take the subset of data with a particular ground truth label and plot the density distributions of model logits. Kernel density curves with peaks above 0 (chance level logit) will be highlighted and included in the legend of the plot.
def plot_model_logits(true_topic, data_folder, finbert_folder, label_dict,
chunking=False, max_length=512, appdx="", bank='JPMorgan',
save_topic=None):
# load ground truth data
path_ground_truth = os.path.join(data_folder, f"ground_truth_{bank}.csv")
df_ground_truth = pd.read_csv(path_ground_truth)
# load model labels
path_model = os.path.join(finbert_folder, f"finbert_topics_{bank}_chunking{chunking}_maxlength{max_length}{appdx}.csv")
df_finbert = pd.read_csv(path_model)
# merge the datasets on uid (multiple labels per uid possible)
df_merged = df_finbert.merge(df_ground_truth, on=['uid'], how='left', suffixes=['_model', '_true'])
# exclude rows with no ground truth
df_merged = df_merged.dropna(axis=0, subset=['true_topic'])
logit_cols = [f"topic_{i}_logit" for i in label_dict.keys()]
df_logits = df_merged[df_merged['true_topic']==true_topic][logit_cols].copy()
cmap = sns.color_palette("tab20")
x_range = np.linspace(-10, 10, 1000)
fig,ax = plt.subplots(1,1, figsize=(4,2))
for col in df_logits.columns:
i = int(col.split("_")[1]) # get topic label id
kde = gaussian_kde(df_logits.loc[:,col].values, bw_method="scott")
kde_values = kde(x_range)
kde_peak = x_range[np.argmax(kde_values)]
if kde_peak<=0:
sns.kdeplot(df_logits.loc[:,col].values, color='grey', alpha=0.2, lw=1)
else:
sns.kdeplot(df_logits.loc[:,col].values, color=cmap[i], alpha=1, lw=2, label=label_dict[i])
fsize=12
ax.set_title(f"True label: {topic}", fontsize=fsize)
ax.set_xlabel("FinBERT logit", fontsize=fsize)
ax.set_ylabel("Density", fontsize=fsize)
plt.tick_params(axis='both', which='major', labelsize=fsize)
plt.legend(loc='upper left', bbox_to_anchor=(1.02, 1), prop={'size': fsize})
for topic in ['Company | Product News', 'Earnings', 'Fed | Central Banks',
'Financials', 'Legal | Regulation', 'M&A | Investments']:
plot_model_logits(true_topic=topic,
data_folder=processed_data_folder,
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=False,
max_length=512,
bank='JPMorgan'
)
These plots confirm that texts manually labelled as, for example, "M&A | Investments" often get classified as "Company | Product News", which is a pair of categories our human labeller often struggled to discriminate as well. Similarly, texts manually labelled as "Legal | Regulation" (which the model seems to rarely use for Q&A texts) often get classified as "Fed | Central Banks".
2.3.1.2.4 chunking=True, maxlength=512ΒΆ
With these arguments, longer texts (3% of the data) will be chunked into equal-sized texts. All information will thus be considered for classification. Some uid will no longer be unique.
process_finbert(
finbert_tokeniser=finbert_topic_tokeniser,
finbert_model=finbert_topic_model,
save_folder=output_data_folder,
label_dict=id2label,
data_folder=processed_data_folder,
chunking=True,
max_length=512,
qa_only=False
)
evaluate_model_performance(
data_folder=processed_data_folder,
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=True,
max_length=512
)
Which bank's data do you wish to label?
['JPMorgan']JPMorgan
There are 108 texts in the evaluated dataset.
precision recall f1-score support
Company | Product News 0.25 0.06 0.10 16
Earnings 0.00 0.00 0.00 24
Fed | Central Banks 0.18 0.33 0.24 6
Financials 0.00 0.00 0.00 28
General News | Opinion 0.11 0.90 0.19 10
Legal | Regulation 0.00 0.00 0.00 7
M&A | Investments 0.00 0.00 0.00 7
Macro 0.00 0.00 0.00 3
Markets 0.00 0.00 0.00 5
Politics 0.00 0.00 0.00 0
Stock Commentary 0.00 0.00 0.00 0
Treasuries | Corporate Debt 0.50 0.50 0.50 2
accuracy 0.12 108
macro avg 0.09 0.15 0.09 108
weighted avg 0.07 0.12 0.05 108
Since the evaluated dataset still contains 108 texts, it is either that none of the chunked texts are part of the ground truth dataset or that all chunked texts in the ground truth dataset ended up with the same label. Thus, it is unsurprising that the evaluation results are exactly the same as before.
We will relabel the output of this model for later use, but will not re-evaluate the relabelled outcome, since the above result already shows that there will be no difference to the previous re-evaluation (2.3.1.2.2).
# reclassify the model
reclassify_finbert(
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=True,
max_length=512,
exclude='General News | Opinion'
)
Which bank's data do you wish to label? ['JPMorgan']JPMorgan
2.3.1.2.5 chunking=True, maxlength=128ΒΆ
We will also try shorter chunks to see if it can help reduce the prevalence of the General News | Opinion label and improve model accuracy.
First, let us check how many texts will get chunked now.
plot_finbert_token_lengths(processed_data_folder, finbert_topic_tokeniser, max_length=128)
Which bank's data do you wish to label? ['JPMorgan']JPMorgan Max token length is exceeded by 368 (41%) entries
Now, we will generate the labels.
process_finbert(
finbert_tokeniser=finbert_topic_tokeniser,
finbert_model=finbert_topic_model,
save_folder=output_data_folder,
label_dict=id2label,
data_folder=processed_data_folder,
chunking=True,
max_length=128,
qa_only=False
)
Finally, we can evaluate the model's performance on this chunked dataset.
evaluate_model_performance(
data_folder=processed_data_folder,
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=True,
max_length=128,
)
Which bank's data do you wish to label?
['JPMorgan']JPMorgan
There are 128 texts in the evaluated dataset.
precision recall f1-score support
Company | Product News 0.33 0.12 0.17 17
Earnings 0.00 0.00 0.00 26
Fed | Central Banks 0.12 0.29 0.17 7
Financials 0.00 0.00 0.00 39
General News | Opinion 0.11 0.83 0.20 12
Legal | Regulation 0.00 0.00 0.00 10
M&A | Investments 0.00 0.00 0.00 7
Macro 0.00 0.00 0.00 3
Markets 0.00 0.00 0.00 5
Politics 0.00 0.00 0.00 0
Stock Commentary 0.00 0.00 0.00 0
Treasuries | Corporate Debt 0.50 0.50 0.50 2
accuracy 0.12 128
macro avg 0.09 0.14 0.09 128
weighted avg 0.07 0.12 0.06 128
Model accuracy has not improved.
Many texts still get misclassified as General News | Opinion. We can check model performance if this topic is ignored.
# reclassify the model
reclassify_finbert(
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=True,
max_length=128,
exclude='General News | Opinion'
)
evaluate_model_performance(
data_folder=processed_data_folder,
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=True,
max_length=128,
appdx='_relabelled'
)
Which bank's data do you wish to label?
['JPMorgan']JPMorgan
Which bank's data do you wish to label?
['JPMorgan']JPMorgan
There are 128 texts in the evaluated dataset.
precision recall f1-score support
Company | Product News 0.41 0.94 0.57 17
Earnings 0.00 0.00 0.00 26
Fed | Central Banks 0.10 0.29 0.14 7
Financials 0.14 0.03 0.04 39
General News | Opinion 0.00 0.00 0.00 12
Legal | Regulation 0.00 0.00 0.00 10
M&A | Investments 0.00 0.00 0.00 7
Macro 0.03 0.33 0.06 3
Markets 0.00 0.00 0.00 5
Politics 0.00 0.00 0.00 0
Stock Commentary 0.00 0.00 0.00 0
Treasuries | Corporate Debt 0.20 0.50 0.29 2
accuracy 0.16 128
macro avg 0.07 0.17 0.09 128
weighted avg 0.11 0.16 0.10 128
Ignoring General News | Opinion, the model has 16% accuracy, which is still better than the model with General News | Opinion, but slightly worse than the model with maxlength=512.
It is also worth checking the model logit distributions for the most frequent ground truth labels.
for topic in ['Company | Product News', 'Earnings', 'Fed | Central Banks',
'Financials', 'Legal | Regulation', 'M&A | Investments']:
plot_model_logits(true_topic=topic,
data_folder=processed_data_folder,
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=True,
max_length=128,
bank='JPMorgan'
)
As before, topics with the best logits (KDE peak above the chance level logit of 0) seem to be in good conceptual alignment with ground truth labels. Similarities between some KDE curves also illustrate the potential downsides of a winner-takes-all classification.
Therefore, all further analyses will be based on model logits, not discrete labels.
2.3.1.3 Using summarised textsΒΆ
A reason for texts being misclassified as "General News | Opinion" might be the conversational language used in the Q&A sections of the calls. To see if a change in tome might mitigate this flaw, we will apply the model to texts summarised using Phi-3.5.
# checking token lengths of summarised texts
plot_finbert_token_lengths(processed_data_folder, finbert_topic_tokeniser, max_length=512, summarised=True,
appdx="_summarised")
Which bank's data do you wish to label? ['JPMorgan']JPMorgan Max token length is exceeded by 0 (0%) entries
As required by the prompt, all texts are really short! Chunking will not be necessary.
# topic label prediction on summarised texts
process_finbert(
finbert_tokeniser=finbert_topic_tokeniser,
finbert_model=finbert_topic_model,
save_folder=output_data_folder,
label_dict=id2label,
data_folder=processed_data_folder,
chunking=False,
max_length=512,
qa_only=False,
summarised=True,
appdx="_summarised"
)
# evaluating model performance on summarised texts
evaluate_model_performance(
data_folder=processed_data_folder,
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=False,
max_length=512,
appdx='_summarised',
save_ground_truth=True
)
Which bank's data do you wish to label?
['JPMorgan']JPMorgan
There are 108 texts in the evaluated dataset.
precision recall f1-score support
Company | Product News 0.38 0.88 0.53 16
Earnings 0.00 0.00 0.00 24
Fed | Central Banks 0.16 0.50 0.24 6
Financials 0.36 0.14 0.21 28
General News | Opinion 0.26 0.50 0.34 10
Legal | Regulation 0.00 0.00 0.00 7
M&A | Investments 0.00 0.00 0.00 7
Macro 0.00 0.00 0.00 3
Markets 1.00 0.20 0.33 5
Treasuries | Corporate Debt 0.50 0.50 0.50 2
accuracy 0.26 108
macro avg 0.27 0.27 0.22 108
weighted avg 0.24 0.26 0.20 108
Accuracy has increased to 26%! The model has become less prone to labelling texts as "General News | Opinion". Interestingly, many of its misclassifications now fall into the "Company | Product News" category.
Let us check if relabelling based on model logits, while ignoring "General News | Opinion", still improves accuracy.
# reclassify the model
reclassify_finbert(
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=False,
max_length=512,
exclude='General News | Opinion',
appdx='_summarised'
)
evaluate_model_performance(
data_folder=processed_data_folder,
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=False,
max_length=512,
appdx='_summarised_relabelled'
)
Which bank's data do you wish to label?
['JPMorgan']JPMorgan
Which bank's data do you wish to label?
['JPMorgan']JPMorgan
There are 108 texts in the evaluated dataset.
precision recall f1-score support
Company | Product News 0.32 0.94 0.48 16
Earnings 0.00 0.00 0.00 24
Fed | Central Banks 0.14 0.50 0.22 6
Financials 0.36 0.14 0.21 28
General News | Opinion 0.00 0.00 0.00 10
Legal | Regulation 0.00 0.00 0.00 7
M&A | Investments 0.00 0.00 0.00 7
Macro 0.00 0.00 0.00 3
Markets 1.00 0.20 0.33 5
Treasuries | Corporate Debt 0.33 0.50 0.40 2
accuracy 0.22 108
macro avg 0.22 0.23 0.16 108
weighted avg 0.20 0.22 0.16 108
Accuracy dropped to 22%. This makes sense because the model was clearly less prone to labelling texts as "General News | Opinion". Instead, the topic texts are most often misclassified as is "Company | Product News". We will therefore check how accuracy changes if this topic is removed.
# reclassify the model
reclassify_finbert(
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=False,
max_length=512,
exclude='Company | Product News',
appdx='_summarised'
)
evaluate_model_performance(
data_folder=processed_data_folder,
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=False,
max_length=512,
appdx='_summarised_relabelled'
)
Which bank's data do you wish to label?
['JPMorgan']JPMorgan
Which bank's data do you wish to label?
['JPMorgan']JPMorgan
There are 108 texts in the evaluated dataset.
precision recall f1-score support
Company | Product News 0.00 0.00 0.00 16
Earnings 0.00 0.00 0.00 24
Fed | Central Banks 0.16 0.50 0.24 6
Financials 0.29 0.18 0.22 28
General News | Opinion 0.17 0.70 0.27 10
Legal | Regulation 0.00 0.00 0.00 7
M&A | Investments 0.00 0.00 0.00 7
Macro 0.05 0.33 0.08 3
Markets 1.00 0.20 0.33 5
Politics 0.00 0.00 0.00 0
Treasuries | Corporate Debt 0.33 0.50 0.40 2
accuracy 0.17 108
macro avg 0.18 0.22 0.14 108
weighted avg 0.15 0.17 0.12 108
Ignoring "Company | Product News", yielded 17% accuracy. The inadequacy of such blanket removal is not surprising given that this topic is genuinely quite prevalent in the dataset.
Overall, it is a good sign that ignoring whole labels no longer helps improve the accuracy.
As a last check, let us see the probability distributions of the most common topics.
for topic in ['Company | Product News', 'Earnings', 'Fed | Central Banks',
'Financials', 'Legal | Regulation', 'M&A | Investments']:
plot_model_logits(true_topic=topic,
data_folder=processed_data_folder,
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=False,
max_length=512,
bank='JPMorgan',
appdx='_summarised',
save_topic='Legal | Regulation'
)
As before, topic probability distributions illustrate that several topics can be associated with texts of a particular "true label", and the margins for misclassification are likely small. Thus, the use of probability distributions, rather than discrete labels, will provide a more complete view of the information contained in the texts.
Overall, we will proceed by using:
- FinBERT output on summarised texts to analyse the Q&A portions of the transcripts because of the better performance;
- FinBERT output on raw texts (
max_length=512) to analyse the presentation portions of the transcripts, because executives' summaries cover a wide range of topics and a two-sentence summary of several pages of text does not capture this topic diversity adequately.
In both cases, we will use topic probability distributions, not discrete labels.
2.3.1.4 ComparisonΒΆ
# modified evaluate_model_performance function for side by side output
def compare_model_performance(data_folder, finbert_folder, label_dict,
chunking=False, max_length=512, appdx="",
save_ground_truth=False):
bank = 'JPMorgan'
appdx_dict = {'_relabelled': 'Raw texts', '_summarised': 'Summarised texts'}
# load ground truth data
path_ground_truth = os.path.join(data_folder, f"ground_truth_{bank}.csv")
df_ground_truth = pd.read_csv(path_ground_truth)
# load model labels
path_model = os.path.join(finbert_folder, f"finbert_topics_{bank}_chunking{chunking}_maxlength{max_length}{appdx}.csv")
df_finbert = pd.read_csv(path_model)
# merge the datasets on uid (multiple labels per uid possible)
df_merged = df_finbert.merge(df_ground_truth, on=['uid'], how='left', suffixes=['_model', '_true'])
# exclude rows with no ground truth
df_merged = df_merged.dropna(axis=0, subset=['true_topic'])
if save_ground_truth:
logit_cols = [col for col in df_merged.columns if 'logit' in col]
df_merged_short = df_merged.drop(['qa_text_model','qa_text_true',
'true_sentiment', 'true_Q_outcome',
'qa_num', 'finbert_topic_id', 'qa_type'] + logit_cols,
axis=1)
df_merged_short.to_csv(os.path.join(data_folder, f"finbert_topics_ground_truth_{bank}_chunking{chunking}_maxlength{max_length}_QA{appdx}.csv"),
index=False)
# get the topics covered in labelled and ground truth datasets
topics = np.unique(np.concatenate((df_merged['true_topic'],df_merged['finbert_topic_label'])))
# plot confusion matrix - just one for presentation
fig2, axes2 = plt.subplots(1,1, figsize=(9,8), sharey=True)
conf_matrix = confusion_matrix(df_merged['true_topic'],
df_merged['finbert_topic_label'],
normalize='true')
sns.heatmap(conf_matrix, ax=axes2, cmap='crest')
fsize=19
axes2.set_xticklabels(topics, rotation=90, fontsize=fsize)
axes2.set_xlabel("Predicted label", fontsize=fsize)
axes2.set_title("Confusion matrix", fontsize=fsize)
axes2.set_ylabel("True label", fontsize=fsize)
axes2.set_yticklabels(topics, rotation=0, fontsize=fsize)
cbar = axes2.collections[0].colorbar
cbar.ax.tick_params(labelsize=fsize)
cbar.set_label("Fraction of true label", fontsize=fsize)
axes2.set_title(appdx_dict[appdx], fontsize=40)
plt.tight_layout()
fig2.savefig(f"finbert-topic-{bank}-chunking{chunking}-maxlength{max_length}{appdx}.png",
dpi=300,bbox_inches="tight")
# plot confusion matrix
fig, axes = plt.subplots(1,2, figsize=(8,3), sharey=True)
for i, (norm, tlt) in enumerate(zip(['pred', 'true'], ['Normalised by predicted', 'Normalised by true'])):
conf_matrix = confusion_matrix(df_merged['true_topic'],
df_merged['finbert_topic_label'],
normalize=norm)
sns.heatmap(conf_matrix, ax=axes[i], cmap='crest')
axes[i].set_xticklabels(topics, rotation=90)
axes[i].set_xlabel("Predicted label")
axes[i].set_title(tlt)
axes[0].set_ylabel("True label")
axes[0].set_yticklabels(topics, rotation=0)
plt.show()
# get classification report
report = classification_report(
df_merged['true_topic'],
df_merged['finbert_topic_label'],
zero_division=0
)
print(report)
out1 = Output()
out2 = Output()
with out1:
compare_model_performance(
data_folder=processed_data_folder,
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=False,
max_length=512,
appdx='_relabelled'
)
with out2:
compare_model_performance(
data_folder=processed_data_folder,
finbert_folder=output_data_folder,
label_dict=id2label,
chunking=False,
max_length=512,
appdx='_summarised',
save_ground_truth=True
)
display(HBox([out1, out2]))
2.3.2 BERTopicΒΆ
The flexibility of BERTopic allows us to choose the model for the different components of BERTopic. We are using PCA for dimension reduction, KMeans cllustering for clustering, and all-MiniLM-L6-v2 for the embedding model.
For the embedding model, the open-ai embedding and the Google universal-sentence-encoder were also used but the results difference is narrow. So we chose all-MiniLM-L6-v2 at the end for its performance speed.
# load the embedding_model (same for all runs)
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
2.3.2.1 Pre-processing functionΒΆ
def preprocess_spacy(text):
# Load spaCy language model
nlp = spacy.load("en_core_web_sm")
# Define a custom list of stop words
custom_stopwords = set([
"and", "billions", "due", "jeremy", "text", "net", "qt", "billion", "million", "think", "thank", "year", "years",
"month", "months", "yeah", "okay", "go", "ok", "hi", "good", "hey", "morning", "sure", "jamie", "jim", "like",
"thing", "bit", "little", "key", "ceo", "got", "lot", "guy"
])
# Define the target part-of-speech tags to keep and consolidate
target_pos = {"NOUN"} # In this example, we choose to consolidate nouns in their lemmatized form
# Process the text as a spaCy document
doc = nlp(text)
# Create an empty set to store consolidated tokens
consolidated_tokens = set()
# Perform preprocessing
for token in doc:
lemma = token.lemma_.lower() # Use lemmatized, lowercase form of the word
# Filtering: remove default stop words, custom stop words, punctuation, whitespace, numbers, and single characters
if (not token.is_stop # Default stop words
and lemma not in custom_stopwords # Custom stop words
and not token.is_punct # Punctuation
and not token.is_space # Whitespace
and not token.like_num # Numbers
and len(lemma) > 1): # Single characters
# If the part of speech is one of the target types (e.g., nouns), consolidate by adding the lemmatized form
if token.pos_ in target_pos:
consolidated_tokens.add(lemma) # Add the lemmatized form of nouns to the set
else:
# Other parts of speech can be added as they are
consolidated_tokens.add(token.text.lower())
# Join tokens back to a string
text = " ".join(consolidated_tokens)
return text
2.3.2.2 Performing on summarised datasetΒΆ
We have already seen that summarisation greatly improve topic modelling. Hence we will run BERTopic on summarised dataset.
# phi full table summarised data.xlsx
df_phi_fulltable_summarised = pd.read_excel(processed_data_folder + '/phi_fulltable_summarised.xlsx')
# phi_ground_truth_summarised data.xlsx
df_phi_ground_truth_summarised = pd.read_excel(processed_data_folder + '/phi_ground_truth_summarised.xlsx')
# apply preprocessing
df_phi_fulltable_summarised_list = df_phi_fulltable_summarised['summarised_text'].apply(preprocess_spacy).to_list()
# running BERTopic
dim_model = PCA(n_components=5)
cluster_model = KMeans(n_clusters=20)
topic_model_phi_summarised = BERTopic(umap_model=dim_model, embedding_model=embedding_model,
hdbscan_model=cluster_model, calculate_probabilities=True)
topics, probabilities = topic_model_phi_summarised.fit_transform(df_phi_fulltable_summarised_list)
# Plot each BERTtopic visualization into a subplot
topic_model_phi_summarised.visualize_topics().write_html("topic_model_phi_summarised_topic.html")
topic_model_phi_summarised.visualize_barchart(top_n_topics=20,n_words=8, autoscale=True).write_html("topic_model_phi_summarised_barchart.html")
topic_model_phi_summarised.visualize_heatmap().write_html("topic_model_phi_summarised_heatmap.html")
topic_model_phi_summarised.visualize_hierarchy().write_html("topic_model_phi_summarised_hierarchy.html")
# Load each plot into a subplot
#display(HTML("topic_model_phi_summarised_topic.html"))
display(HTML("topic_model_phi_summarised_barchart.html"))
display(HTML("topic_model_phi_summarised_heatmap.html"))
display(HTML("topic_model_phi_summarised_hierarchy.html"))
plt.tight_layout()
plt.show()
2.4 Q/A Evasion and Generalisability of Phi-3.5ΒΆ
2.4.0 Using FinBERT classification for Q/A evasionΒΆ
Using the labels from FinBERT, we can perform a very basic analysis to see if the the topics of questions and answers match.
We will correlate the topic probability distributions of all question-answer pairs. This will create a distribution of correlation coefficients.
We will repeat this analysis on data with shuffled Q-A relationships.
def plot_answers_to_questions(finbert_folder, data_folder,
label_dict, chunking=True, max_length=512,
bank="JPMorgan", datatype="QA", appdx="", save=False,
summarised=False, synthetic=False, exclude=None):
df_merged, _, metadata = get_merged_data_for_plotting(finbert_folder, data_folder,
label_dict, chunking=chunking, max_length=max_length,
bank=bank, datatype=datatype, appdx=appdx,
summarised=summarised, synthetic=synthetic)
# subset QAs
df_merged = df_merged[~df_merged['qa_num'].isna()].reset_index(drop=True).copy()
# select only Qs that have As
mask = df_merged.groupby("qa_num")["qa_type"].transform(lambda x: {'A', 'Q'}.issubset(set(x)))
df_merged = df_merged[mask]
# exclude General News | Opinion
if exclude is not None:
gen_key = [key for key,val in label_dict.items() if val==exclude][0]
# define probability columns
prob_cols = [col for col in df_merged.columns if 'prob' in col and f'topic_{gen_key}' not in col]
else:
prob_cols = [col for col in df_merged.columns if 'prob' in col]
# define all columns to keep and subset the df
subset_cols =prob_cols+ ['uid','qa_num', 'qa_type', 'finbert_topic_label']
df_qa = df_merged[subset_cols]
# subset answers
df_answers = df_qa[df_qa['qa_type']=='A'].reset_index(drop=True).copy()
df_questions = df_qa[df_qa['qa_type']=='Q'].reset_index(drop=True).copy()
df_questions['qa_num_shuffled'] = df_questions['qa_num'].sample(frac=1).reset_index(drop=True)
# add cols to fill
df_answers['qa_corr'] = np.nan
df_answers['qa_corr_shuffled'] = np.nan
# iterate over answers
for i, row in df_answers.iterrows():
answer = np.array([row[col] for col in prob_cols])
question = df_questions[df_questions['qa_num']==row.qa_num][prob_cols].values.flatten()
question_shuffled = df_questions[df_questions['qa_num_shuffled']==row.qa_num][prob_cols].values.flatten()
df_answers.at[i,'qa_corr'] = pearsonr(answer, question)[0]
df_answers.at[i,'qa_corr_shuffled'] = pearsonr(answer, question_shuffled)[0]
# stats
_, pval = mannwhitneyu(df_answers['qa_corr'], df_answers['qa_corr_shuffled'], alternative='two-sided')
print(f"p-val: {pval}")
# plot
fig, ax = plt.subplots(1, 1, figsize=(4,2))
sns.kdeplot(df_answers['qa_corr'], ax=ax, color='#FE7F2D', alpha=0.6, label='Data', lw=2)
sns.kdeplot(df_answers['qa_corr_shuffled'], ax=ax, color='#233D4D', alpha=0.6, label='Shuffled', lw=2, ls='dashed')
ax.set_xlabel("Pearson correlation coefficient")
plt.legend(loc='upper left', bbox_to_anchor=(1.02, 1))
# save correlation data
if save:
df_for_saving = df_answers[['uid', 'qa_num', 'qa_corr']]
if synthetic:
df_for_saving['uid'] = ["_".join(x.split("_")[:-1]) for x in df_for_saving['uid']]
path_for_saving = os.path.join(finbert_folder, f"finbert_QA_topic_correlations{appdx}.csv")
df_for_saving.to_csv(path_for_saving, index=False)
# Checking question evasiveness on raw text
plot_answers_to_questions(finbert_folder=output_data_folder, data_folder=processed_data_folder,
label_dict=id2label, chunking=False, max_length=512,
bank="JPMorgan", datatype="QA", appdx="",
exclude="General News | Opinion", save=True)
p-val: 1.2490241106466923e-19
# Now - on text summarised by Phi-3.5
plot_answers_to_questions(finbert_folder=output_folder, data_folder=data_folder,
label_dict=id2label, chunking=False, max_length=512,
bank="JPMorgan", datatype="QA", appdx="_summarised",
summarised=True, save=True)
p-val: 3.095980672406378e-13
With both datasets, topic alignment is significantly better than chance for the true Q-A pairs than for shuffled pairs. Interestingly, text summarisation has resulted in equal-height peaks near Q-A topic correlation coefficients of 0 and 1, suggesting similar numbers of answered and avoided questions. Analysis of raw text was a bit more complimentary for the executives.
2.4.1 Phi-3.5 Pipeline initialisation and Helper functionsΒΆ
# Initialise the pipeline - note T4 GPU does not contain enough RAM, so use CPU and ignore warning OR run on A100 GPU
pipe = pipeline("text-generation", model="microsoft/Phi-3.5-mini-instruct", trust_remote_code=True, device=0)
2.4.1.0 Preparing datasets for Phi 3.5 (with results from other analyses)ΒΆ
"""
Aggregation function to merge Questions with their Answers
- all answers to a question will be merged into one
- each qa_text row will now start with the questions asked and then continue into all of the answers given
- this allows for assessment of question evasion in the answers
- assumes RoBERTa Sentiment and finBERT(qa_corr) columns already included
"""
# Function to aggregate Q and A pairs into single rows for question-evasion testing (test data only)
def aggregate_test_data(df):
# Select columns to keep for test data
df = df[['uid', 'qa_type', 'qa_num', 'qa_text', 'true_sentiment', 'true_topic', 'true_Q_outcome', 'RoBERTa Sentiment', 'qa_corr', 'qa_corr_summarised', 'finBERT Topic Classification', 'finBERT(summarised) Topic Classification']]
# Convert 'qa_num' to numeric, forcing non-numeric values to NaN, then drop rows where 'qa_num' is NaN
# Note this will exclude all 'N' type rows, which is required for QA pair analysis
df['qa_num'] = pd.to_numeric(df['qa_num'], errors='coerce')
df = df.dropna(subset=['qa_num'])
# Convert qa_num, qa_corr to integers/floats and qa_text to strings
df.loc[:, 'qa_num'] = df['qa_num'].astype(int)
df.loc[:, 'qa_corr'] = df['qa_corr'].astype(float)
df.loc[:, 'qa_corr_summarised'] = df['qa_corr_summarised'].astype(float)
df.loc[:, 'qa_text'] = df['qa_text'].astype(str)
# Group by 'qa_num', with different approach for each column
def custom_aggregation(group):
# Aggregate qa_text by joining texts for each question
aggregated_text = ' '.join(group['qa_text'])
# For true_Q_outcome, take the first value
true_Q_outcome = group['true_Q_outcome'].iloc[0] if 'true_Q_outcome' in group else None
# For qa_corrs, take the mean
qa_corr_value = group['qa_corr'].mean() if 'qa_corr' in group else None
qa_corr_value_summarised = group['qa_corr_summarised'].mean() if 'qa_corr_summarised' in group else None
# For true_topic, true_sentiment, finBERT and RoBERTa Sentiment, only take values where qa_type is 'A'
true_topic = group.loc[group['qa_type'] == 'A', 'true_topic'].unique().tolist() if 'true_topic' in group else []
true_sentiment = group.loc[group['qa_type'] == 'A', 'true_sentiment'].unique().tolist() if 'true_sentiment' in group else []
roberta_sentiment = group.loc[group['qa_type'] == 'A', 'RoBERTa Sentiment'].unique().tolist() if 'RoBERTa Sentiment' in group else []
finBERT_topics_1 = group.loc[group['qa_type'] == 'A', 'finBERT Topic Classification'].unique().tolist() if 'finBERT Topic Classification' in group else []
finBERT_topics_summarised = group.loc[group['qa_type'] == 'A', 'finBERT(summarised) Topic Classification'].unique().tolist() if 'finBERT(summarised) Topic Classification' in group else []
# Print uids for manual review where multiple sentiments/topics in answer parts
check_list = [true_sentiment, roberta_sentiment, true_topic, finBERT_topics_1, finBERT_topics_summarised]
check_names = ['true_sentiment', 'RoBERTa Sentiment', 'true_topic', 'finBERT Topic Classification', 'finBERT(summarised) Topic Classification']
manual_review_list = []
for i, check in enumerate(check_list):
if len(check) > 1:
manual_review_list.append(group['uid'].iloc[0])
# Remove duplicates and filter out empty strings from the list
clean_manual_review_list = list(set([uid for uid in manual_review_list if uid]))
print(clean_manual_review_list)
# Convert lists to strings (removing brackets) for values expected to be singular values
true_topic = ', '.join(true_topic)
true_sentiment = ', '.join(true_sentiment)
roberta_sentiment = ', '.join(roberta_sentiment)
finBERT_topics_1 = ', '.join(finBERT_topics_1)
finBERT_topics_summarised = ', '.join(finBERT_topics_summarised)
# Return a dictionary of aggregated values
return pd.Series({
'uid': group['uid'].iloc[0].replace('_Q_', '_QA_'),
'qa_type': 'QA',
'qa_num': group['qa_num'].iloc[0],
'qa_text': aggregated_text,
'true_sentiment': true_sentiment,
'true_topic': true_topic,
'true_Q_outcome': true_Q_outcome,
'RoBERTa Sentiment': roberta_sentiment,
'qa_corr': qa_corr_value,
'finBERT Topic Classification': finBERT_topics_1,
'qa_corr_summarised': qa_corr_value_summarised,
'finBERT(summarised) Topic Classification': finBERT_topics_summarised
})
# Apply custom aggregation
aggregated_df = df.groupby('qa_num', as_index=False).apply(custom_aggregation).reset_index(drop=True)
# Rename columns and set index to 'uid' for both dfs
for table in [aggregated_df, df]:
for col in table.columns:
if col == 'true_sentiment': table.rename(columns={'true_sentiment': 'True Sentiment'}, inplace=True)
if col == 'true_topic': table.rename(columns={'true_topic': 'True Topic'}, inplace=True)
if col == 'true_Q_outcome': table.rename(columns={'true_Q_outcome': 'True Evasion Present'}, inplace=True)
# Visualise DataFrames
print("\nOriginal DataFrame:", len(df), "rows")
display(df.head())
print("\nAggregated QA DataFrame:", len(aggregated_df), "rows")
display(aggregated_df.head())
return df, aggregated_df
"""
Aggregation function to merge Questions with their Answers
- all answers to a question will be merged into one
- each qa_text row will now start with the questions asked and then continue into all of the answers given
- this allows for assessment of question evasion in the answers
"""
# Function to aggregate Q and A pairs into single rows for question-evasion testing (Non-test data only)
def aggregate_QA_data(df):
# Convert 'qa_num' to numeric, forcing non-numeric values to NaN, then drop rows where 'qa_num' is NaN
# Note this will exclude all 'N' type rows, which is required for QA pair analysis
df.loc[:, 'qa_num'] = pd.to_numeric(df['qa_num'], errors='coerce')
df = df.dropna(subset=['qa_num'])
# Convert qa_num to integers and qa_text to strings
df.loc[:, 'qa_num'] = df['qa_num'].astype(int)
df.loc[:, 'qa_text'] = df['qa_text'].astype(str)
# Check
print("\nOriginal DataFrame:", len(df), "rows")
display(df.head())
# Only keep essential columns: 'uid', 'qa_type', 'qa_num', and 'qa_text' (common to all pre-processed files) for real data
df = df[['uid', 'qa_type', 'qa_num', 'qa_text']]
# Group by 'qa_num' and aggregate the 'qa_text' column by joining texts for each question
df.loc[:, 'qa_text'] = df.groupby('qa_num')['qa_text'].transform(lambda x: ' '.join(x))
# Drop duplicates to keep only one row per 'qa_num' with the aggregated text
df = df.drop_duplicates(subset=['qa_num'])
# Convert all qa_type entries to QA and 'uid' to QA
df.loc[:, 'qa_type'] = 'QA'
df['uid'] = df['uid'].str.replace('_Q_', '_QA_')
# Set index to uid and drop index column
df = df.set_index('uid', drop=True)
# Check
print("\nAggregated QA DataFrame:", len(df), "rows")
display(df.head())
return df
# Function to calculate finBERT Evasion result from qa_corr or qa_corr_summarised value (if higher than 0.9, become Not Evasive, else Evasive)
def calculate_finBERT_evasion(df):
# Convert qa_corr to float type
if 'qa_corr' in df.columns:
df['qa_corr'] = df['qa_corr'].astype(float)
df['qa_corr'] = np.where(df['qa_corr'] > 0.9, 'Not Evasive', 'Evasive')
df.rename(columns={'qa_corr': 'finBERT Evasion Present'}, inplace=True)
if 'qa_corr_summarised' in df.columns:
df['qa_corr_summarised'] = df['qa_corr_summarised'].astype(float)
df['qa_corr_summarised'] = np.where(df['qa_corr_summarised'] > 0.9, 'Not Evasive', 'Evasive')
df.rename(columns={'qa_corr_summarised': 'finBERT(summarised) Evasion Present'}, inplace=True)
# Set index to uid and drop index column
df = df.set_index('uid', drop=True)
return df
2.4.1.1 PromptsΒΆ
# Define prompt questions for Evasion, Sentiment and Topic analysis with Phi-3.5 and Q&A tables of pre-processed financial transcripts
# All questions (for evasion to run, must be aggregated dataset, otherwise it will ignore evasion question)
combined_questions = [
f"1. Analyze the following financial question-answer pairs. Determine if the answer is evasive or non-evasive. Consider the following factors:\n"
"1. Deception: Is the answer ambiguous, misleading or trying to conceal something? 2. Relevance: Is the information provided relevant to the question? 3. Specificity: Is the answer specific enough to provide an informative response in a financial context?\n\n"
"Example of an Evasive answer: What is the company's strategy for addressing the rising costs of raw materials? We are taking positive steps to mitigate the impact of rising costs. Explanation: This is evasive because the answer does not give any detail about the strategy. \n\n"
"Example of a non-evasive answer: How will the recent interest rate hike impact the bank's net interest margin? The recent interest rate hike is expected to positively impact the bank's net interest margin, as the bank's interest-earning assets tend to reprice faster than its interest-bearing liabilities. Explanation: This is a non-evasive answer as it clearly states the impact.\n\n"
"If evasion is detected, classify it as Evasive, with a level of 'Low', 'Moderate', or 'High'. For each instance of evasion, identify up to 5 specific topics that were not adequately addressed.\n"
"Each theme should be a short keyword or phrase summarizing the topic, e.g., '2024 consensus,' 'NII Markets,' 'Asset-Sensitive Number.'\n"
"Your response should use the format:\n\n"
"Evasive: [Low/Moderate/High]; [topic1, topic2, topic3]\n\n",
f"2. Categorise the sentiment in this text as positive, negative, or neutral. Provide relative percentage scores and up to 5 concise keywords or themes contributing to each sentiment, suitable for direct input into a table. Use the following format:\n\n"
"Positive: %; [theme1, theme2, theme3]\n"
"Negative: %; [theme1, theme2, theme3]\n"
"Neutral: %; [theme1, theme2, theme3]\n"
"Each theme should be a short keyword or phrase summarizing the topic, e.g., 'NII,' '2024 consensus,' or 'Asset-Sensitive Number.'\n"
f"3. Identify the primary topic discussed in the following financial transcript and classify it as one of these categories: "
"Earnings, Financials, M&A | Investments, General News | Opinion, Fed | Central Banks, Company | Product News, Markets, Treasuries | Corporate Debt, Legal | Regulation, Macro, Energy | Oil, Currencies, Analyst Update, IPO, Dividend, Politics, Gold | Metals | Materials, Stock Movement, Personnel Change, Stock Commentary.\n"
"Do not make up other categories, assign it to the best fit from the categories provided. Your final response must use the format:\n\n"
"finBERT Topic: [category]\n\n"
]
# Sentiment only
sentiment_questions = [
f"1. Categorise the financial sentiment in this text as positive, negative, or neutral. Provide relative percentage scores and up to 5 concise keywords or themes contributing to each sentiment, suitable for direct input into a table. Use the following format:\n\n"
"Positive: %; [theme1, theme2, theme3]\n"
"Negative: %; [theme1, theme2, theme3]\n"
"Neutral: %; [theme1, theme2, theme3]\n"
"Each theme should be a short keyword or phrase summarizing the topic, e.g., 'NII,' '2024 consensus,' or 'Asset-Sensitive Number.'\n",
]
# Evasion only (needs aggregated QA dataset)
evasion_questions = [
f"Analyze the following financial question-answer pairs. Determine if the answer is evasive or non-evasive. Consider the following factors:\n"
"1. Deception: Is the answer ambiguous, misleading or trying to conceal something? 2. Relevance: Is the information provided relevant to the question? 3. Specificity: Is the answer specific enough to provide an informative response in a financial context?\n\n"
"Example of an Evasive answer: What is the company's strategy for addressing the rising costs of raw materials? We are taking positive steps to mitigate the impact of rising costs. Explanation: This is evasive because the answer does not give any detail about the strategy. \n\n"
"Example of a non-evasive answer: How will the recent interest rate hike impact the bank's net interest margin? The recent interest rate hike is expected to positively impact the bank's net interest margin, as the bank's interest-earning assets tend to reprice faster than its interest-bearing liabilities. Explanation: This is a non-evasive answer as it clearly states the impact.\n\n"
"If evasion is detected, classify it as Evasive, with a level of 'Low', 'Moderate', or 'High'. For each instance of evasion, identify up to 5 specific topics that were not adequately addressed.\n"
"Each theme should be a short keyword or phrase summarizing the topic, e.g., '2024 consensus,' 'NII Markets,' 'Asset-Sensitive Number.'\n"
"Your response should use the format:\n\n"
"Evasive: [Low/Moderate/High]; [topic1, topic2, topic3]\n\n"
]
# Topic assignment to match finBERT topics
finBERT_topic_questions = [
f"Identify the primary topic discussed in the following financial transcript and classify it as one of these categories: "
"Earnings, Financials, M&A | Investments, General News | Opinion, Fed | Central Banks, Company | Product News, Markets, Treasuries | Corporate Debt, Legal | Regulation, Macro, Energy | Oil, Currencies, Analyst Update, IPO, Dividend, Politics, Gold | Metals | Materials, Stock Movement, Personnel Change, Stock Commentary.\n"
"Do not make up other categories, assign it to the best fit from the categories provided. Your final response must use the format:\n\n"
"finBERT Topic: [category]\n\n"
]
2.4.1.2 Functions to run analyses with Phi-3.5ΒΆ
"""
Financial keywords to search for in text are defined here
"""
keyword_search_terms = ["SS3/24", "SS3", "SS three", "supervisory statement", "PRA", "credit risk definition of default", "definition of default", "Bank of England", "BoE", "England", "Basel", "Basel III", "Basel IV", "S9/24", "S9", "S nine"]
"""
Function to search provided text for financial keywords
- this can be done by whole pdf (as text file) or row by row of Q&A table
"""
# Search for keywords of interest
def keyword_search(text, keywords):
found_terms = [term for term in keywords if term.lower() in text.lower()]
if found_terms:
term_list = ', '.join(found_terms)
return term_list
else:
return "None"
"""
Function to run phi-3.5 on tabular questions and answers.
This will take as input either the aggregated Q&A pairs (for combined or separate question-evasion and sentiment/topic analysis) or separate Q, A and N rows table (for sentiment and topic analysis only)
The prompt used can vary but must produce the following output format (where applicable for Sentiment or Evasion analysis):
"Positive: %; [theme1, theme2, theme3]
Negative: %; [theme1, theme2, theme3]
Neutral: %; [theme1, theme2, theme3]
Each theme should be a short keyword or phrase summarizing the topic, e.g., 'NII,' '2024 consensus,' or 'Asset-Sensitive Number.'
Evasive: [Low/Moderate/High]; [topic1, topic2, topic3]
Each theme should be a short keyword or phrase summarizing the topic, e.g., '2024 consensus,' 'NII Markets,' 'Asset-Sensitive Number.''
"
The non-aggregated 'separate Q, A and N rows table' input option is intended only for ground-truth assessment of performance against other methods such as BERTopic, finBERT and RoBERTa.
Analysis path is determined by presence of 'sentiment' and/or 'Evasive' in the prompt
There is a sense check against non-aggregated data being accidentally input with the evasive prompt (as this only works with aggregated data).
"""
# Function to run phi-3.5 on tabular questions and answers
def phi_question_answer(input_df, prompt_questions, input_aggregated='N', input_col='qa_text'):
print("input_aggregated:", input_aggregated)
# Set up progress counter and timer
start_time = time.time()
total_count = len(input_df)
x = 0
# Determine table columns based on the questions asked
columns = ["uid", "keywords"]
if any("Evasive" in q for q in prompt_questions):
columns.extend(["Phi-3.5 Evasion Present", "Phi-3.5 Evasion Degree", "Phi-3.5 Evaded Topics"])
# Check that aggregated flag is on for evasion input data - if not exit function with warning
if input_aggregated == 'N':
print("Error: Evasion question present in prompt but non-aggregated dataset provided. Analysis stopped.")
return None
if any("sentiment" in q for q in prompt_questions):
columns.extend(["Phi-3.5 Sentiment", "Phi-3.5 Positive %", "Phi-3.5 Negative %", "Phi-3.5 Neutral %", "Phi-3.5 Positive Topics", "Phi-3.5 Negative Topics", "Phi-3.5 Neutral Topics"])
if any("finBERT" in q for q in prompt_questions):
columns.extend(["Phi-3.5 Topic Classification"])
# Initialise PrettyTable with dynamic columns
table = PrettyTable(columns)
# Loop through rows in phi_input_df and apply keyword search and Phi3.5 questions to each row
for i, row in input_df.iterrows():
# Iterate counter
x += 1
# Check for keywords
text = row[input_col]
keywords = keyword_search(text, keyword_search_terms)
if keywords != 'None':
print(f"Keywords found: {keywords}")
if x == 1 or x % 5 == 0:
print(f"{x}/{total_count} ...")
# Initialise variables to store extracted data (depending on questions asked)
row_data = {"uid": i, "keywords": keywords}
pos_value, neg_value, neut_value = 0, 0, 0
if "Phi-3.5 Positive %" in columns:
row_data.update({"Phi-3.5 Sentiment": "N/A", "Phi-3.5 Positive %": "N/A", "Phi-3.5 Negative %": "N/A", "Phi-3.5 Neutral %": "N/A", "Phi-3.5 Positive Topics": "N/A", "Phi-3.5 Negative Topics": "N/A", "Phi-3.5 Neutral Topics": "N/A"})
if "Phi-3.5 Evasion Present" in columns:
row_data.update({"Phi-3.5 Evasion Present": "N/A", "Phi-3.5 Evasion Degree": "N/A", "Phi-3.5 Evaded Topics": "N/A"})
if "Phi-3.5 Topic Classification" in columns:
row_data.update({"Phi-3.5 Topic Classification": "N/A"})
# Answer prompt questions with Phi-3.5 on text from each row
for question in prompt_questions:
prompt = f"Context: {text}\n\nQuestion: {question}\nAnswer:"
output = pipe(prompt, max_new_tokens=130, min_length=20, do_sample=False, eos_token_id=pipe.tokenizer.eos_token_id)
answer = output[0]["generated_text"].replace(prompt, "").strip()
#print(answer)
# Extract Topic Classification information from the answer, if finBERT classification question asked
if "finBERT" in question:
if 'finBERT Topic:' in answer:
row_data["Phi-3.5 Topic Classification"] = answer.split('Topic: ')[1].split('\n')[0].strip() if 'Topic: ' in answer else "N/A"
# Extract sentiment-related information from the answer, if sentiment question asked
if "sentiment" in question:
if "Positive:" in answer:
pos_part = answer.split("Positive: ")[1]
pos_value = float(pos_part.split(";")[0].strip().replace("%", "")) if pos_part else 0
row_data["Phi-3.5 Positive %"] = pos_value
row_data["Phi-3.5 Positive Topics"] = pos_part.split("[")[1].split("]")[0].strip() if "[" in pos_part and "]" in pos_part else "N/A"
if "Negative:" in answer:
neg_part = answer.split("Negative: ")[1]
neg_value = float(neg_part.split(";")[0].strip().replace("%", "")) if neg_part else 0
row_data["Phi-3.5 Negative %"] = neg_value
row_data["Phi-3.5 Negative Topics"] = neg_part.split("[")[1].split("]")[0].strip() if "[" in neg_part and "]" in neg_part else "N/A"
if "Neutral:" in answer:
neutral_part = answer.split("Neutral: ")[1]
neut_value = float(neutral_part.split(";")[0].strip().replace("%", "")) if neutral_part else 0
row_data["Phi-3.5 Neutral %"] = neut_value
row_data["Phi-3.5 Neutral Topics"] = neutral_part.split("[")[1].split("]")[0].strip() if "[" in neutral_part and "]" in neutral_part else "N/A"
# Determine the overall sentiment based on the highest percentage value, checking for "Neutral" first, with logic conflicts also resulting in Neutral
if neut_value >= pos_value and neut_value >= neg_value:
row_data["Phi-3.5 Sentiment"] = "Neutral"
elif neg_value >= pos_value and neg_value >= neut_value:
row_data["Phi-3.5 Sentiment"] = "Negative"
elif pos_value > neg_value and pos_value > neut_value:
row_data["Phi-3.5 Sentiment"] = "Positive"
else:
row_data["Phi-3.5 Sentiment"] = "Neutral"
# Extract question evasion determination from the answer, if evasion question asked
if "Evasive" in question:
if "Evasive" in answer:
row_data["Phi-3.5 Evasion Present"] = "Evasive"
# If Evasion degree is low, set to Non-Evasive
if "Low" in answer:
row_data["Phi-3.5 Evasion Degree"] = "Low"
row_data["Phi-3.5 Evasion Present"] = "Not Evasive"
elif "Moderate" in answer: row_data["Phi-3.5 Evasion Degree"] = "Moderate"
elif "High" in answer: row_data["Phi-3.5 Evasion Degree"] = "High"
else:
row_data["Phi-3.5 Evasion Present"] = "Not Evasive"
row_data["Phi-3.5 Evasion Degree"] = "None"
row_data["Phi-3.5 Evaded Topics"] = answer.split('[')[1].split(']')[0].strip() if '[' in answer and ']' in answer else "N/A"
# Add the data to the PrettyTable
table.add_row([row_data[col] for col in columns])
# Convert PrettyTable to a DataFrame and merge with original table
phi_temp_df = pd.DataFrame(table.rows, columns=table.field_names)
phi_final_df = pd.merge(input_df, phi_temp_df, left_index=True, right_on="uid", how="outer")
phi_final_df.set_index('uid', inplace=True)
phi_final_df = phi_final_df.sort_values(by='qa_num')
# View df
display(phi_final_df.head())
# Calculate time for full dataset
end_time = time.time()
time_taken = end_time - start_time
print(f"Time taken for {len(phi_final_df)} QA: {round(time_taken/60, 2)} minutes")
if input_aggregated == 'Y':
print(f"Estimate for all transcripts: {round((time_taken/len(phi_final_df) * 375)/60/60, 2)} hours")
else:
print(f"Estimate for all transcripts: {round((time_taken/len(phi_final_df) * 927)/60/60, 2)} hours")
return phi_final_df
2.4.1.2 Plot functionsΒΆ
"""
Functions to calculate accuracy metrics for sentiment and evasion comparisons and display barcharts and confusion matrices for Phi results
"""
# Function to compare all columns with 'Sentiment' or 'Evasion Present' in the column name against 'True Sentiment' or 'True Evasion Present' column for each row
def calculate_accuracy(df, sentiment_flag='Y', evasion_flag='Y', topic_flag='Y'):
accuracy_results = {
'Sentiment Accuracy': {},
'Evasion Accuracy': {},
'Topic Accuracy': {}
}
threshold_results = {
'Sentiment Threshold': 0.33,
'Evasion Threshold': 0.5,
'Topic Threshold': 0.2
}
num_sentiments = 3
num_evasion = 2
num_topics = 20
# Find all columns that contain 'Sentiment' and compare them against 'True Sentiment'
sentiment_columns = [col for col in df.columns if 'sentiment' in col.lower() and col != 'True Sentiment']
if sentiment_columns:
for col in sentiment_columns:
sentiment_accuracy = ((df[col].astype(str).str.lower() == df['True Sentiment'].astype(str).str.lower()).sum() / len(df)) * 100
accuracy_results['Sentiment Accuracy'][col] = round(sentiment_accuracy, 2)
unique_values = df[col].unique()
threshold_results['Sentiment Threshold'] = round((100 / num_sentiments), 2)
else:
print("No sentiment columns found.")
# Find all columns that contain 'Evasion Present' and compare them against 'True Evasion Present'
evasion_columns = [col for col in df.columns if 'evasion present' in col.lower() and col != 'True Evasion Present']
if evasion_columns:
for col in evasion_columns:
evasion_accuracy = ((df[col].astype(str).str.lower() == df['True Evasion Present'].astype(str).str.lower()).sum() / len(df)) * 100
accuracy_results['Evasion Accuracy'][col] = round(evasion_accuracy, 2)
unique_values = df[col].unique()
threshold_results['Evasion Threshold'] = round((100 / num_evasion), 2)
else:
print("No evasion columns found.")
# Find all columns that contain 'Classification' and compare them against 'True Topic'
topic_columns = [col for col in df.columns if 'classification' in col.lower() and col != 'True Topic']
if topic_columns:
for col in topic_columns:
topic_accuracy = ((df[col].astype(str).str.lower() == df['True Topic'].astype(str).str.lower()).sum() / len(df)) * 100
accuracy_results['Topic Accuracy'][col] = round(topic_accuracy, 2)
unique_values = df[col].unique()
threshold_results['Topic Threshold'] = round((100 / num_topics), 2)
else:
print("No topic classification columns found.")
# Print the accuracy results
if accuracy_results['Topic Accuracy'] and topic_flag == 'Y':
print(f"\nTopic Accuracy Results (Significance Threshold {threshold_results['Topic Threshold']}%):")
for method, acc in accuracy_results['Topic Accuracy'].items():
print(f"{method}: {acc}%")
if accuracy_results['Sentiment Accuracy'] and sentiment_flag == 'Y':
print(f"\nSentiment Accuracy Results (Significance Threshold {threshold_results['Sentiment Threshold']}%):")
for method, acc in accuracy_results['Sentiment Accuracy'].items():
print(f"{method}: {acc}%")
if accuracy_results['Evasion Accuracy'] and evasion_flag == 'Y':
print(f"\nEvasion Accuracy Results (Significance Threshold {threshold_results['Evasion Threshold']}%):")
for method, acc in accuracy_results['Evasion Accuracy'].items():
print(f"{method}: {acc}%")
plot_accuracy_charts(accuracy_results, df, threshold_results, sentiment_flag, evasion_flag, topic_flag)
return accuracy_results
# Function to plot separate accuracy barcharts for sentiment and evasion comparisons
def plot_accuracy_charts(accuracy_results, df, threshold_results, sentiment_flag='Y', evasion_flag='Y', topic_flag='Y'):
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
plot_idx = 0
threshold_line_colour = 'darkgrey'
# Plot Topic Accuracy
a2 = 0
if accuracy_results['Topic Accuracy'] and topic_flag == 'Y':
plot_name = 'Topic Accuracy'
threshold = threshold_results.get('Topic Threshold', 0)
labels = [label.split()[0] for label in accuracy_results[plot_name].keys()]
axes[0, a2].bar(accuracy_results[plot_name].keys(), accuracy_results[plot_name].values(), color='darkgreen')
axes[0, a2].set_title(plot_name, fontsize=16)
axes[0, a2].set_ylabel('Accuracy %', fontsize=14)
axes[0, a2].set_ylim(0, 100)
axes[0, a2].set_xticks(range(len(accuracy_results[plot_name])))
axes[0, a2].set_xticklabels(labels, rotation=45, ha='right')
axes[0, a2].tick_params(axis='y', labelsize=14)
axes[0, a2].tick_params(axis='x', labelsize=14)
# Add a dashed horizontal line at the chance threshold
axes[0, a2].axhline(y=threshold, color=threshold_line_colour, linestyle='--')
# Annotate each bar with the accuracy value
for i, (label, value) in enumerate(accuracy_results[plot_name].items()):
axes[0, a2].text(i, value -1, f'{value:.1f}%', ha='center', va='top', fontsize=12, color='white')
else:
axes[0, a2].axis('off')
# Plot Sentiment Accuracy
a2 = 1
if accuracy_results['Sentiment Accuracy'] and sentiment_flag == 'Y':
plot_name = 'Sentiment Accuracy'
threshold = threshold_results.get('Sentiment Threshold', 0)
labels = [label.split()[0] for label in accuracy_results[plot_name].keys()]
axes[0, a2].bar(accuracy_results[plot_name].keys(), accuracy_results[plot_name].values(), color='royalblue')
axes[0, a2].set_title(plot_name, fontsize=16)
axes[0, a2].set_ylabel('Accuracy %', fontsize=14)
axes[0, a2].set_ylim(0, 100)
axes[0, a2].set_xticks(range(len(accuracy_results[plot_name])))
axes[0, a2].set_xticklabels(labels, rotation=45, ha='right')
axes[0, a2].tick_params(axis='y', labelsize=14)
axes[0, a2].tick_params(axis='x', labelsize=14)
# Add a dashed horizontal line at the chance threshold
axes[0, a2].axhline(y=threshold, color=threshold_line_colour, linestyle='--')
# Annotate each bar with the accuracy value
for i, (label, value) in enumerate(accuracy_results[plot_name].items()):
axes[0, a2].text(i, value -1, f'{value:.1f}%', ha='center', va='top', fontsize=12, color='white')
else:
axes[0, a2].axis('off')
# Plot Evasion Accuracy
a2 = 2
if accuracy_results['Evasion Accuracy'] and evasion_flag == 'Y':
plot_name = 'Evasion Accuracy'
threshold = threshold_results.get('Evasion Threshold', 0)
labels = [label.split()[0] for label in accuracy_results[plot_name].keys()]
axes[0, a2].bar(accuracy_results[plot_name].keys(), accuracy_results[plot_name].values(), color='brown')
axes[0, a2].set_title(plot_name, fontsize=16)
axes[0, a2].set_ylabel('Accuracy %', fontsize=14)
axes[0, a2].set_ylim(0, 100)
axes[0, a2].set_xticks(range(len(accuracy_results[plot_name])))
axes[0, a2].set_xticklabels(labels, rotation=45, ha='right')
axes[0, a2].tick_params(axis='y', labelsize=14)
axes[0, a2].tick_params(axis='x', labelsize=14)
# Add a dashed horizontal line at the chance threshold
axes[0, a2].axhline(y=threshold, color=threshold_line_colour, linestyle='--')
# Annotate each bar with the accuracy value
for i, (label, value) in enumerate(accuracy_results[plot_name].items()):
axes[0, a2].text(i, value -1, f'{value:.1f}%', ha='center', va='top', fontsize=12, color='white')
else:
axes[0, a2].axis('off')
# Find the model with the highest accuracy for each category
best_sentiment_model = max(accuracy_results['Sentiment Accuracy'], key=accuracy_results['Sentiment Accuracy'].get, default=None)
best_evasion_model = max(accuracy_results['Evasion Accuracy'], key=accuracy_results['Evasion Accuracy'].get, default=None)
best_topic_model = max(accuracy_results['Topic Accuracy'], key=accuracy_results['Topic Accuracy'].get, default=None)
# Plot confusion matrix for best Topic Classification model
a2 = 0
if 'True Topic' in df.columns and best_topic_model in df.columns and topic_flag == 'Y':
cm_best_topics = confusion_matrix(df['True Topic'], df[best_topic_model], labels=df['True Topic'].unique())
sns.heatmap(cm_best_topics, annot=True, fmt='d', cmap='Greens', ax=axes[1, a2], xticklabels=df['True Topic'].unique(), yticklabels=df['True Topic'].unique(), annot_kws={"size": 16})
axes[1, a2].set_title(f'{best_topic_model.split()[0]}: Best Model\nTopic Classification', fontsize=14)
axes[1, a2].set_xlabel('Predicted', fontsize=14)
axes[1, a2].set_ylabel('True', fontsize=14)
axes[1, a2].tick_params(axis='y', labelsize=12)
axes[1, a2].tick_params(axis='x', labelsize=12)
else:
axes[1, a2].axis('off')
# Plot confusion matrix for best Sentiment model
a2 = 1
if 'True Sentiment' in df.columns and best_sentiment_model in df.columns and sentiment_flag == 'Y':
cm_best_sentiment = confusion_matrix(df['True Sentiment'], df[best_sentiment_model], labels=df['True Sentiment'].unique())
sns.heatmap(cm_best_sentiment, annot=True, fmt='d', cmap='Blues', ax=axes[1, a2], xticklabels=df['True Sentiment'].unique(), yticklabels=df['True Sentiment'].unique(), annot_kws={"size": 16})
axes[1, a2].set_title(f'{best_sentiment_model.split()[0]}: Best Model\nSentiment Analysis', fontsize=14)
axes[1, a2].set_xlabel('Predicted', fontsize=14)
axes[1, a2].set_ylabel('True', fontsize=14)
axes[1, a2].tick_params(axis='y', labelsize=12)
axes[1, a2].tick_params(axis='x', labelsize=12)
else:
axes[1, a2].axis('off')
# Plot confusion matrix for best Evasion model
a2 = 2
if 'True Evasion Present' in df.columns and best_evasion_model in df.columns and evasion_flag == 'Y':
cm_best_evasion = confusion_matrix(df['True Evasion Present'], df[best_evasion_model], labels=df['True Evasion Present'].unique())
sns.heatmap(cm_best_evasion, annot=True, fmt='d', cmap='Reds', ax=axes[1, a2], xticklabels=df['True Evasion Present'].unique(), yticklabels=df['True Evasion Present'].unique(), annot_kws={"size": 16})
axes[1, a2].set_title(f'{best_evasion_model.split()[0]}: Best Model\nQuestion Evasion', fontsize=14)
axes[1, a2].set_xlabel('Predicted', fontsize=14)
axes[1, a2].set_ylabel('True', fontsize=14)
axes[1, a2].tick_params(axis='y', labelsize=12)
axes[1, a2].tick_params(axis='x', labelsize=12)
else:
axes[1, a2].axis('off')
plt.tight_layout()
plt.show()
2.4.2 QA Evasion, Sentiment Analysis and Topic Modelling on Synthetic Data setΒΆ
2.4.2.0 Preparing synthetic data tablesΒΆ
2.4.2.0.1 Running RoBERTa on the synthetic datasetΒΆ
def get_synthetic(data_folder):
data = pd.read_csv(data_folder + "Synthetic_Data.csv")
data['true_sentiment'] = data['True Sentiment'].str.lower()
return data
# load the synthetic dataset
synthetic_data = get_synthetic(processed_data_folder)
synthetic_data['true_score'] = synthetic_data['true_sentiment'].map(score_dict)
# get response from RoBERTa
roberta_model = ClassificationModel("financial-roberta-large", "soleimanian/financial-roberta-large-sentiment")
synthetic_data = roberta_model.get_model_response_for_df(synthetic_data, 'qa_text')
synthetic_data.rename(columns={'financial-roberta-large_sentiment':'RoBERTa Sentiment'}, inplace=True)
synthetic_data = synthetic_data[['uid', 'RoBERTa Sentiment']]
# save the result
synthetic_data.to_csv(output_data_folder + '/sentiment_eval_result_synthetic.csv')
2.4.2.0.2 Running Finbert on the synthetic datasetΒΆ
def preprocess_synthetic_data(data_folder):
df_synthetic = pd.read_csv(data_folder + "/Synthetic_Data.csv")
df_synthetic_processed = pd.DataFrame(columns = ['uid', 'qa_type', 'qa_num', 'qa_text'])
for row in df_synthetic.itertuples():
q,a = row.qa_text.split("?")
q += '?'
rows_to_append = pd.DataFrame({
'uid': [row.uid+'_Q', row.uid+'_A'],
'qa_type': ['Q','A'],
'qa_num': [row.qa_num]*2,
'qa_text': [q, a]
})
df_synthetic_processed = pd.concat([df_synthetic_processed, rows_to_append], ignore_index=True)
save_path = os.path.join(data_folder, f"synthetic_data_for_finbert.csv")
df_synthetic_processed.to_csv(save_path, index=False)
preprocess_synthetic_data(processed_data_folder)
process_finbert(
finbert_tokeniser=finbert_topic_tokeniser,
finbert_model=finbert_topic_model,
save_folder=output_data_folder,
label_dict=id2label,
data_folder=processed_data_folder,
chunking=False,
max_length=512,
qa_only=True,
synthetic=True,
appdx='_synthetic'
)
# Compute question evasion
plot_answers_to_questions(finbert_folder=output_data_folder,
data_folder=processed_data_folder,
label_dict=id2label,
chunking=False,
max_length=512,
bank="JPMorgan",
datatype="QA",
appdx="_synthetic",
synthetic=True,
save=True)
2.4.2.0.3 Create synthetic data tableΒΆ
# Load Synthetic data files (change to filepaths in Inputs/Outputs folder later)
synthetic_QA_df = pd.read_excel(processed_data_folder + "Synthetic_QA_dataset.xlsx")
# Get RoBERTa synthetic output file
synthetic_Roberta_result = pd.read_csv(processed_data_folder + "sentiment_eval_result_synthetic.csv")
# Get finBERT synthetic output files
synthetic_finBERT_result_corr = pd.read_csv(output_data_folder + "finbert_QA_topic_correlations_synthetic.csv")
synthetic_finBERT_result_topics = pd.read_csv(output_data_folder + "finbert_topics_JPMorgan_chunkingFalse_maxlength512_synthetic.csv")
# Merge RoBERTa Sentiment column into Synthetic results table
synthetic_QA_df = pd.merge(synthetic_QA_df, synthetic_Roberta_result[['uid', 'RoBERTa Sentiment']], on='uid', how='left')
# Drop _Q entries in uid column from synthetic_finBERT_result_topics and remove _A to match uid in other tables
synthetic_finBERT_result_topics = (
synthetic_finBERT_result_topics[~synthetic_finBERT_result_topics['uid'].str.contains('_Q')]
.assign(uid=lambda x: x['uid'].str.replace('_A', '', regex=False))
)
# Merge specific columns from finBERT tables into Synthetic results table
synthetic_QA_df = pd.merge(synthetic_QA_df, synthetic_finBERT_result_corr[['uid', 'qa_corr']], on='uid', how='left')
synthetic_QA_df = pd.merge(synthetic_QA_df, synthetic_finBERT_result_topics[['uid', 'finbert_topic_label']], on='uid', how='left')
synthetic_QA_df.rename(columns={'finbert_topic_label': 'finBERT Topic Classification'}, inplace=True)
# Calculate evasion for finBERT on synthetic data (also re-sets index to uid)
synthetic_QA_df = calculate_finBERT_evasion(synthetic_QA_df)
2.4.2.1 Running Phi-3.5 on synthetic data setΒΆ
"""
Assess accuracy of Phi3.5 with provided prompt against GPT4-generated synthetic dataset,
with benchmarking against 'True' human-confirmed labels and finBERT/RoBERTa.
A small synthetic financial Q&A dataset with combined/aggregated questions and answers in a 'qa_text' column
Question-answer pairs were generated by manual adaption of GPT-4o-mini responses
4x Positive sentiment with: No Evasion, Low Evasion, Moderate Evasion, High Evasion
4x Negative sentiment with: No Evasion, Low Evasion, Moderate Evasion, High Evasion
4x Neutral sentiment with: No Evasion, Low Evasion, Moderate Evasion, High Evasion
All were checked manually and wording adapted where required to conform to human-accurate sentiment and evasion categorisation
This dataset (synthetic_QA_df) is designed to assess Phi3.5 accuracy against human-labelled data. It has first been run through RoBERTa and finBERT to create a benchmarking set.
"""
# Run Phi-3.5 function on synthetic data for classification into finBERT topic categories
print("Running Text Classification analysis")
synthetic_QA_df_topics = phi_question_answer(synthetic_QA_df, finBERT_topic_questions, input_aggregated='Y', input_col='qa_text')
# Run Phi-3.5 function on synthetic data for evasion
print("Running Evasion analysis")
synthetic_QA_df_evasion = phi_question_answer(synthetic_QA_df, evasion_questions, input_aggregated='Y', input_col='qa_text')
## Run Phi-3.5 function on synthetic data for sentiment
print("Running Sentiment analysis")
synthetic_QA_df_sentiment = phi_question_answer(synthetic_QA_df, sentiment_questions, input_aggregated='Y', input_col='qa_text')
Running Text Classification analysis input_aggregated: Y 1/12 ...
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead. `get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48 WARNING:transformers_modules.microsoft.Phi-3.5-mini-instruct.af0dfb8029e8a74545d0736d30cb6b58d2f0f3f0.modeling_phi3:You are not running the flash-attention implementation, expect numerical differences.
5/12 ... 10/12 ...
You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
| qa_type | qa_num | qa_text | True Sentiment | True Evasion Present | True Topic Classification | RoBERTa Sentiment | finBERT Evasion Present | finBERT Topic Classification | keywords | Phi-3.5 Topic Classification | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| uid | |||||||||||
| synthetic_1 | QA | 1 | What is the impact of the new marketing strate... | Positive | Not Evasive | Macro | Positive | Evasive | Macro | None | Markets |
| synthetic_2 | QA | 2 | How do you view the impact of the current inte... | Positive | Evasive | Fed | Central Banks | Positive | Evasive | Fed | Central Banks | None | Fed | Central Banks |
| synthetic_3 | QA | 3 | Can you explain the benefits of the new financ... | Positive | Evasive | Company | Product News | Positive | Evasive | Company | Product News | None | Company | Product News |
| synthetic_4 | QA | 4 | What measures are being implemented to improve... | Positive | Not Evasive | Company | Product News | Positive | Evasive | General News | Opinion | None | Risk Management |
| synthetic_5 | QA | 5 | What are the recent trends in loan default rat... | Neutral | Not Evasive | Financials | Neutral | Evasive | Treasuries | Corporate Debt | None | General News |
Time taken for 12 QA: 1.37 minutes Estimate for all transcripts: 0.71 hours Running Evasion analysis input_aggregated: Y 1/12 ... 5/12 ... 10/12 ...
| qa_type | qa_num | qa_text | True Sentiment | True Evasion Present | True Topic Classification | RoBERTa Sentiment | finBERT Evasion Present | finBERT Topic Classification | keywords | Phi-3.5 Evasion Present | Phi-3.5 Evasion Degree | Phi-3.5 Evaded Topics | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uid | |||||||||||||
| synthetic_1 | QA | 1 | What is the impact of the new marketing strate... | Positive | Not Evasive | Macro | Positive | Evasive | Macro | None | Evasive | Moderate | loan growth projections, marketing strategy, u... |
| synthetic_2 | QA | 2 | How do you view the impact of the current inte... | Positive | Evasive | Fed | Central Banks | Positive | Evasive | Fed | Central Banks | None | Evasive | Moderate | strategy details, cost mitigation, impact anal... |
| synthetic_3 | QA | 3 | Can you explain the benefits of the new financ... | Positive | Evasive | Company | Product News | Positive | Evasive | Company | Product News | None | Evasive | Moderate | Strategy Details, Impact Analysis, Timeline, C... |
| synthetic_4 | QA | 4 | What measures are being implemented to improve... | Positive | Not Evasive | Company | Product News | Positive | Evasive | General News | Opinion | None | Evasive | Moderate | Strategy Details, Impact Analysis, Cost Mitiga... |
| synthetic_5 | QA | 5 | What are the recent trends in loan default rat... | Neutral | Not Evasive | Financials | Neutral | Evasive | Treasuries | Corporate Debt | None | Not Evasive | Low | Interest rate impact, Net interest margin, Ban... |
Time taken for 12 QA: 1.38 minutes Estimate for all transcripts: 0.72 hours Running Sentiment analysis input_aggregated: Y 1/12 ... 5/12 ... 10/12 ...
| qa_type | qa_num | qa_text | True Sentiment | True Evasion Present | True Topic Classification | RoBERTa Sentiment | finBERT Evasion Present | finBERT Topic Classification | keywords | Phi-3.5 Sentiment | Phi-3.5 Positive % | Phi-3.5 Negative % | Phi-3.5 Neutral % | Phi-3.5 Positive Topics | Phi-3.5 Negative Topics | Phi-3.5 Neutral Topics | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uid | |||||||||||||||||
| synthetic_1 | QA | 1 | What is the impact of the new marketing strate... | Positive | Not Evasive | Macro | Positive | Evasive | Macro | None | Positive | 100.0 | 0.0 | 0.0 | loan growth, marketing strategy, increased pro... | ||
| synthetic_2 | QA | 2 | How do you view the impact of the current inte... | Positive | Evasive | Fed | Central Banks | Positive | Evasive | Fed | Central Banks | None | Positive | 90.0 | 10.0 | 0.0 | interest rates, lending capabilities, economic... | ||
| synthetic_3 | QA | 3 | Can you explain the benefits of the new financ... | Positive | Evasive | Company | Product News | Positive | Evasive | Company | Product News | None | Positive | 100.0 | 0.0 | 0.0 | competitive rates, innovative features, market... | ||
| synthetic_4 | QA | 4 | What measures are being implemented to improve... | Positive | Not Evasive | Company | Product News | Positive | Evasive | General News | Opinion | None | Positive | 70.0 | 0.0 | 30.0 | Risk Management, Real-Time Data, Mitigation, A... | Financial Context, Measures, Implementation, M... | |
| synthetic_5 | QA | 5 | What are the recent trends in loan default rat... | Neutral | Not Evasive | Financials | Neutral | Evasive | Treasuries | Corporate Debt | None | Neutral | 0.0 | 0.0 | 100.0 | Stability, No Increase, No Decrease | Loan Default Rates, Recent Trends, Financial S... |
Time taken for 12 QA: 1.38 minutes Estimate for all transcripts: 0.72 hours
# Merge unique columns in synthetic tables
unique_columns_in_topics = [
col for col in synthetic_QA_df_topics.columns if col not in synthetic_QA_df_evasion.columns
]
unique_columns_in_sentiment = [
col for col in synthetic_QA_df_sentiment.columns if col not in synthetic_QA_df_evasion.columns
]
synthetic_QA_df_results = pd.merge(synthetic_QA_df_evasion, synthetic_QA_df_topics[unique_columns_in_topics], left_index=True, right_on="uid", how="outer")
synthetic_QA_df_results = pd.merge(synthetic_QA_df_results, synthetic_QA_df_sentiment[unique_columns_in_sentiment], left_index=True, right_on="uid", how="outer")
synthetic_QA_df_results.head()
| qa_type | qa_num | qa_text | True Sentiment | True Evasion Present | True Topic Classification | RoBERTa Sentiment | finBERT Evasion Present | finBERT Topic Classification | keywords | ... | Phi-3.5 Evasion Degree | Phi-3.5 Evaded Topics | Phi-3.5 Topic Classification | Phi-3.5 Sentiment | Phi-3.5 Positive % | Phi-3.5 Negative % | Phi-3.5 Neutral % | Phi-3.5 Positive Topics | Phi-3.5 Negative Topics | Phi-3.5 Neutral Topics | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uid | |||||||||||||||||||||
| synthetic_1 | QA | 1 | What is the impact of the new marketing strate... | Positive | Not Evasive | Macro | Positive | Evasive | Macro | None | ... | Moderate | loan growth projections, marketing strategy, u... | Markets | Positive | 100.0 | 0.0 | 0.0 | loan growth, marketing strategy, increased pro... | ||
| synthetic_10 | QA | 10 | What is the bank's strategy for addressing ris... | Negative | Evasive | Financials | Negative | Evasive | General News | Opinion | None | ... | Moderate | loan defaults strategy, specifics, deteriorati... | General News | Negative | 0.0 | 100.0 | 0.0 | N/A | loan defaults, strategy, deteriorating, non-di... | context, bank, addressing, rising |
| synthetic_11 | QA | 11 | What are the main challenges facing our loan p... | Negative | Evasive | Macro | Negative | Evasive | Macro | None | ... | Moderate | rising defaults, economic instability, loan po... | Loan Portfolio Challenges | Negative | 0.0 | 100.0 | 0.0 | N/A | loan defaults, economic instability, portfolio... | strategies, review, complexities |
| synthetic_12 | QA | 12 | What steps are being taken to enhance credit r... | Negative | Not Evasive | General News | Opinion | Negative | Evasive | General News | Opinion | None | ... | Moderate | strategy details, economic decline, future def... | Credit Risk Management | Negative | 10.0 | 85.0 | 5.0 | stricter criteria, enhancing strategies, credi... | struggling, economic decline, inadequate effor... | context, steps, taken |
| synthetic_2 | QA | 2 | How do you view the impact of the current inte... | Positive | Evasive | Fed | Central Banks | Positive | Evasive | Fed | Central Banks | None | ... | Moderate | strategy details, cost mitigation, impact anal... | Fed | Central Banks | Positive | 90.0 | 10.0 | 0.0 | interest rates, lending capabilities, economic... |
5 rows Γ 21 columns
synthetic_QA_df_results.to_csv(output_data_folder + 'phi_synthetic_QA_results.csv')
2.4.2.2 Compare results with other modelsΒΆ
# Calculate accuracies of Overall Sentiment, Topic and Question Evasion results
accuracy_results = calculate_accuracy(synthetic_QA_df_results)
Topic Accuracy Results (Significance Threshold 5.0%): finBERT Topic Classification: 41.67% Phi-3.5 Topic Classification: 33.33% Sentiment Accuracy Results (Significance Threshold 33.33%): RoBERTa Sentiment: 100.0% Phi-3.5 Sentiment: 83.33% Evasion Accuracy Results (Significance Threshold 50.0%): finBERT Evasion Present: 41.67% Phi-3.5 Evasion Present: 66.67%
SYNTHETIC DATA RESULTS
Question Evasion
Initial results showed that Phi3.5 was overestimating answers as Evasive (5/6 Not Evasive categorised as Evasive) - this led to initial conclusion of poor performance.
The prompt wording was then adapted to attempt to fine-tune the results, e.g. by adding in examples of Not Evasive answers. This approach was successful, and made the model more accurate (results above).
When the prompt for Phi3.5 focussed on directness of the answer, the results were similar to finBERT, perhaps suggesting Phi was interpreting directness of answer in terms of question topics covered. This 'direct' prompt was removed and is not shown here.
Note that Gemini was also tested on this dataset but was excluded from analysis as it gave very poor Evasion results (likely because its prompt was not optimised). This illustrated the importance of prompt optimisation for such tasks, but was outside of the scope of this project. Therefore only GPT4 was used for benchmarking accuracy.
Phi-3.5 outperformed finBERT for Evasion analysis, as might be expected as finBERT was using topic classification correlation as a proxy for Evasion.
Sentiment Analysis
RoBERTa outperformed Phi-3.5 for Sentiment categorisation, though Phi3.5 still gave good results.
Topic Classification
finBERT outperformed Phi-3.5 for Topic classification, again this is not surprising as finBERT is specifically designed for use with financial datasets.
2.4.3 QA Evasion, Sentiment Analysis and Topic Modelling on Ground Truth Data setΒΆ
2.4.3.0 Preparing ground truth data tablesΒΆ
def merge_datasets_for_plotting(finbert_folder, data_folder,
chunking=True, max_length=512,
bank="JPMorgan", appdx=""):
path_qa_corr = os.path.join(finbert_folder, f"finbert_QA_topic_correlations{appdx}.csv")
df_qa_corr = pd.read_csv(path_qa_corr)
path_finbert_ground_truth = os.path.join(data_folder, f"finbert_topics_ground_truth_{bank}_chunking{chunking}_maxlength{max_length}_QA{appdx}.csv")
df_finbert_ground_truth = pd.read_csv(path_finbert_ground_truth)
merged = df_finbert_ground_truth.merge(df_qa_corr, on='uid')
path_finbert_ground_truth_answers_only = os.path.join(data_folder, f"finbert_topics_ground_truth_{bank}_chunking{chunking}_maxlength{max_length}{appdx}.csv")
merged.to_csv(path_finbert_ground_truth, index=False)
# Load ground truth files (change to filepaths in Inputs/Outputs folders later)
ground_truth_df = pd.read_excel(processed_data_folder + "ground_truth_JPMorgan_manual.xlsx")
# Get RoBERTa ground truth output file
ground_truth_Roberta_result = pd.read_csv(output_data_folder + "sentiment_eval_result_ground_truth.csv")
# merging Q/A evasion correlations with Finbert + ground truth output
merge_datasets_for_plotting(output_data_folder, processed_data_folder,
chunking=False, max_length=512,
bank="JPMorgan", appdx="")
merge_datasets_for_plotting(output_data_folder, processed_data_folder,
chunking=False, max_length=512,
bank="JPMorgan", appdx="_summarised")
# Get finBERT ground truth output files for correlation scores
ground_truth_finBERT_result_corr = pd.read_csv(output_data_folder + "finbert_topics_ground_truth_JPMorgan_chunkingFalse_maxlength512.csv")
ground_truth_finBERT_result_corr = ground_truth_finBERT_result_corr[['uid','qa_corr']]
ground_truth_finBERT_result_corr_sum = pd.read_csv(output_data_folder + "finbert_topics_ground_truth_JPMorgan_chunkingFalse_maxlength512_summarised.csv")
ground_truth_finBERT_result_corr_sum = ground_truth_finBERT_result_corr_sum[['uid','qa_corr']]
# Get finBERT ground truth output files for Topics
ground_truth_finBERT_result_topics = pd.read_csv(output_data_folder + "finbert_topics_ground_truth_JPMorgan_chunkingFalse_maxlength512_QA.csv")
ground_truth_finBERT_result_topics_sum = pd.read_csv(output_data_folder + "finbert_topics_ground_truth_JPMorgan_chunkingFalse_maxlength512_QA_summarised.csv")
# Evasion and Topic Classification
# Re-name columns to distinguish columns between summarised and non-summarised results
ground_truth_finBERT_result_topics.rename(columns={'finbert_topic_label': 'finBERT Topic Classification'}, inplace=True)
ground_truth_finBERT_result_topics_sum.rename(columns={'finbert_topic_label': 'finBERT(summarised) Topic Classification'}, inplace=True)
ground_truth_finBERT_result_corr_sum.rename(columns={'qa_corr': 'qa_corr_summarised'}, inplace=True)
# Merge results columns from finBERT into ground truth table
ground_truth_df = pd.merge(ground_truth_df, ground_truth_finBERT_result_topics[['uid', 'finBERT Topic Classification']], on='uid', how='left')
ground_truth_df = pd.merge(ground_truth_df, ground_truth_finBERT_result_topics_sum[['uid', 'finBERT(summarised) Topic Classification']], on='uid', how='left')
ground_truth_df = pd.merge(ground_truth_df, ground_truth_finBERT_result_corr[['uid', 'qa_corr']], on='uid', how='left')
ground_truth_df = pd.merge(ground_truth_df, ground_truth_finBERT_result_corr_sum[['uid', 'qa_corr_summarised']], on='uid', how='left')
# Sentiment Analysis
# Merge results column from RoBERTa into ground truth table
ground_truth_df = pd.merge(ground_truth_df, ground_truth_Roberta_result[['uid', 'financial-roberta-large_sentiment']], on='uid', how='left')
# Capitalise each entry in sentiment column and re-name column
ground_truth_df['financial-roberta-large_sentiment'] = ground_truth_df['financial-roberta-large_sentiment'].str.capitalize()
ground_truth_df.rename(columns={'financial-roberta-large_sentiment': 'RoBERTa Sentiment'}, inplace=True)
# Tidy up non-aggregated df column names and create an aggregated df with combined Q&A on each row for evasion analysis
# Contains manual check on aggregated data for multiple entries caused by multi-part answers - these must be removed where possible for accurate comparison.
ground_truth_df, ground_truth_df_agg = aggregate_test_data(ground_truth_df)
# View data of multi-result fields in aggregated ground truth results - replace with result for longest text string in qa_text
# List of prefixes to search for in the 'uid' column
prefixes = [
"JPMorganChase_4Q23_A_30",
"JPMorganChase_3Q23_A_11",
"JPMorganChase_2Q23_A_23",
"JPMorganChase_2Q23_A_43",
"JPMorganChase_1Q23_A_42",
"JPMorganChase_3Q22_A_29",
"JPMorganChase_2Q22_A_1",
"JPMorganChase_2Q22_A_21",
"JPMorganChase_2Q22_A_41",
"JPMorganChase_4Q21_A_41",
"JPMorganChase_4Q21_A_43",
"JPMorganChase_3Q21_A_8",
"JPMorganChase_2Q21_A_24",
"JPMorganChase_2Q21_A_37"
]
manual_corrections = {
'uid': '',
'True Sentiment': '',
'RoBERTa Sentiment': '',
'True Topic': '',
'finBERT Topic Classification': '',
'finBERT(summarised) Topic Classification': ''
}
# Loop through each prefix and filter rows in the non-aggregated DataFrame
for prefix in prefixes:
filtered_rows_non_agg = ground_truth_df[ground_truth_df['uid'].str.contains(prefix, regex=True)]
# Find the row with the longest 'qa_text' for each prefix
if not filtered_rows_non_agg.empty:
longest_text_row = filtered_rows_non_agg.loc[filtered_rows_non_agg['qa_text'].str.len().idxmax()]
updated_uid = prefix.replace('_Q_', '_QA_') + ".0"
updated_uid = prefix.replace('_A_', '_QA_') + ".0"
# Update the ground_truth_df_agg with values from the non-agg DataFrame
if updated_uid in ground_truth_df_agg['uid'].values:
ground_truth_df_agg.loc[
ground_truth_df_agg['uid'] == updated_uid,
['True Sentiment', 'RoBERTa Sentiment', 'True Topic', 'finBERT Topic Classification', 'finBERT(summarised) Topic Classification']
] = [
longest_text_row['True Sentiment'],
longest_text_row['RoBERTa Sentiment'],
longest_text_row['True Topic'],
longest_text_row['finBERT Topic Classification'],
longest_text_row['finBERT(summarised) Topic Classification']
]
# Calculate evasion for finBERT (also re-sets index to uid)
ground_truth_df_agg = calculate_finBERT_evasion(ground_truth_df_agg)
print("Aggregated ground truth dataframe")
display(ground_truth_df_agg.head())
ground_truth_df = calculate_finBERT_evasion(ground_truth_df)
print("Ground truth dataframe")
display(ground_truth_df.head())
Aggregated ground truth dataframe
| qa_type | qa_num | qa_text | True Sentiment | True Topic | True Evasion Present | RoBERTa Sentiment | finBERT Evasion Present | finBERT Topic Classification | finBERT(summarised) Evasion Present | finBERT(summarised) Topic Classification | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| uid | |||||||||||
| JPMorganChase_3Q24_QA_28.0 | QA | 28 | And so Daniel's comments in September were on ... | Negative | Earnings | Not Evasive | Negative | Evasive | Financials | Evasive | Financials |
| JPMorganChase_2Q24_QA_26.0 | QA | 62 | Very good. And as a follow-up, you've been ver... | Positive | Financials | Not Evasive | Positive | Evasive | General News | Opinion | Evasive | Macro |
| JPMorganChase_1Q24_QA_30.0 | QA | 97 | Thank you. And I guess, as a tie-in to that qu... | Positive | M&A | Investments | Not Evasive | Neutral | Evasive | General News | Opinion | Evasive | Company | Product News |
| JPMorganChase_1Q24_QA_33.0 | QA | 100 | As a quick follow-up. Where are the next homer... | Negative | General News | Opinion | Not Evasive | Neutral | Evasive | General News | Opinion | Evasive | Company | Product News |
| JPMorganChase_4Q23_QA_30.0 | QA | 131 | Just to make sure I understood what you're say... | Negative | Earnings | Evasive | Neutral | Evasive | General News | Opinion | Evasive | Financials |
Ground truth dataframe
| qa_type | qa_num | qa_text | True Sentiment | True Topic | True Evasion Present | RoBERTa Sentiment | finBERT Evasion Present | finBERT(summarised) Evasion Present | finBERT Topic Classification | finBERT(summarised) Topic Classification | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| uid | |||||||||||
| JPMorganChase_3Q24_Q_28.0 | Q | 28 | And so Daniel's comments in September were on ... | Neutral | Earnings | Not Evasive | Neutral | Evasive | Evasive | General News | Opinion | Fed | Central Banks |
| JPMorganChase_3Q24_A_28.0 | A | 28 | No. Those were core NII or NII ex.. So again, ... | Negative | Earnings | NaN | Negative | Evasive | Evasive | Financials | Financials |
| JPMorganChase_2Q24_Q_26.0 | Q | 62 | Very good. And as a follow-up, you've been ver... | Neutral | Financials | Not Evasive | Neutral | Evasive | Evasive | General News | Opinion | General News | Opinion |
| JPMorganChase_2Q24_A_26.0 | A | 62 | Yeah. It's a good question. I think the short ... | Positive | Financials | NaN | Positive | Evasive | Evasive | General News | Opinion | Macro |
| JPMorganChase_1Q24_Q_30.0 | Q | 97 | Thank you. And I guess, as a tie-in to that qu... | Neutral | M&A | Investments | Not Evasive | Neutral | Evasive | Evasive | General News | Opinion | General News | Opinion |
2.4.3.1 Running Phi-3.5 on ground truth data setΒΆ
"""Aggregated ground truth dataset - Evasion (aggregated ground truth)
Evasion for Phi must be run on an aggregated dataset, otherwise it will ignore the question.
"""
# Run Phi-3.5 function for Evasion on aggregated data
ground_truth_df_evasion = phi_question_answer(ground_truth_df_agg, evasion_questions, input_aggregated='Y', input_col='qa_text')
input_aggregated: Y 1/42 ... Keywords found: Basel III 5/42 ... 10/42 ... 15/42 ... 20/42 ... 25/42 ... 30/42 ... 35/42 ... 40/42 ... Keywords found: Basel III
| qa_type | qa_num | qa_text | True Sentiment | True Topic | True Evasion Present | RoBERTa Sentiment | finBERT Evasion Present | finBERT Topic Classification | finBERT(summarised) Evasion Present | finBERT(summarised) Topic Classification | keywords | Phi-3.5 Evasion Present | Phi-3.5 Evasion Degree | Phi-3.5 Evaded Topics | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uid | |||||||||||||||
| JPMorganChase_3Q24_QA_28.0 | QA | 28 | And so Daniel's comments in September were on ... | Negative | Earnings | Not Evasive | Negative | Evasive | Financials | Evasive | Financials | None | Evasive | Moderate | 2024 consensus, 2025 consensus, NII ex. Market... |
| JPMorganChase_2Q24_QA_26.0 | QA | 62 | Very good. And as a follow-up, you've been ver... | Positive | Financials | Not Evasive | Positive | Evasive | General News | Opinion | Evasive | Macro | None | Not Evasive | Low | C&I charge-off rate, Underwriting process, Cre... |
| JPMorganChase_1Q24_QA_30.0 | QA | 97 | Thank you. And I guess, as a tie-in to that qu... | Positive | M&A | Investments | Not Evasive | Neutral | Evasive | General News | Opinion | Evasive | Company | Product News | None | Evasive | Moderate | company strategy, competitive dynamics, risk a... |
| JPMorganChase_1Q24_QA_33.0 | QA | 100 | As a quick follow-up. Where are the next homer... | Negative | General News | Opinion | Not Evasive | Neutral | Evasive | General News | Opinion | Evasive | Company | Product News | Basel III | Evasive | Moderate | bank failures, private equity, family offices,... |
| JPMorganChase_4Q23_QA_30.0 | QA | 131 | Just to make sure I understood what you're say... | Negative | Earnings | Evasive | Neutral | Evasive | General News | Opinion | Evasive | Financials | None | Evasive | Moderate | 2024 consensus, NII Markets, First Republic co... |
Time taken for 42 QA: 4.9 minutes Estimate for all transcripts: 0.73 hours
""" Sentiment and Topic Analysis
Non-aggregated ground truth dataset - Sentiment (non-aggregated ground truth) and Topics
For true comparability assessment here against RoBERTa, Phi Sentiment analysis is run on non-aggregated set
However, it is also run on aggregated set for assessment of the method of aggregating RoBERTa sentiments and finBERT topics for the QA_ground_truth set, which would be required to run the combined_questions prompt
"""
# Run Phi-3.5 function for Sentiment/ Topics
print("Sentiment analysis on Q&A ground truth table")
ground_truth_df_sentiment = phi_question_answer(ground_truth_df, sentiment_questions, input_aggregated='N', input_col='qa_text')
print("Topic analysis on Q&A ground truth table")
ground_truth_df_topics = phi_question_answer(ground_truth_df, finBERT_topic_questions, input_aggregated='N', input_col='qa_text')
print("Sentiment analysis on aggregated QA ground truth table")
ground_truth_df_sentiment_agg = phi_question_answer(ground_truth_df_agg, sentiment_questions, input_aggregated='Y', input_col='qa_text')
print("Topic analysis on aggregated QA ground truth table")
ground_truth_df_topics_agg = phi_question_answer(ground_truth_df_agg, finBERT_topic_questions, input_aggregated='Y', input_col='qa_text')
Sentiment analysis on Q&A ground truth table input_aggregated: N 1/108 ... 5/108 ... Keywords found: Basel III 10/108 ... 15/108 ... 20/108 ... 25/108 ... 30/108 ... 35/108 ... 40/108 ... 45/108 ... 50/108 ... 55/108 ... 60/108 ... 65/108 ... 70/108 ... 75/108 ... 80/108 ... 85/108 ... 90/108 ... 95/108 ... 100/108 ... Keywords found: Basel III 105/108 ...
| qa_type | qa_num | qa_text | True Sentiment | True Topic | True Evasion Present | RoBERTa Sentiment | finBERT Evasion Present | finBERT(summarised) Evasion Present | finBERT Topic Classification | finBERT(summarised) Topic Classification | keywords | Phi-3.5 Sentiment | Phi-3.5 Positive % | Phi-3.5 Negative % | Phi-3.5 Neutral % | Phi-3.5 Positive Topics | Phi-3.5 Negative Topics | Phi-3.5 Neutral Topics | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uid | |||||||||||||||||||
| JPMorganChase_3Q24_Q_28.0 | Q | 28 | And so Daniel's comments in September were on ... | Neutral | Earnings | Not Evasive | Neutral | Evasive | Evasive | NaN | NaN | None | Positive | 60.0 | 20.0 | 20.0 | NII growth, 2024 consensus, Asset-Sensitive Nu... | Market volatility, Risk assessment, Economic u... | Financial context, Daniel's comments, Market c... |
| JPMorganChase_3Q24_A_28.0 | A | 28 | No. Those were core NII or NII ex.. So again, ... | Negative | Earnings | NaN | Negative | Evasive | Evasive | Financials | Financials | None | Negative | 10.0 | 80.0 | 10.0 | 2025 consensus, NII, Asset-Sensitive Number, M... | Insufficient Decline, Toppy Numbers, Sequentia... | Financial Context, Core NII, Market Analysis, ... |
| JPMorganChase_2Q24_Q_26.0 | Q | 62 | Very good. And as a follow-up, you've been ver... | Neutral | Financials | Not Evasive | Neutral | Evasive | Evasive | NaN | NaN | None | Positive | 70.0 | 10.0 | 20.0 | C&I portfolio, strong performance, elevated ra... | non-accrual loans increase, potential cracks, ... | context clarification, follow-up question, fin... |
| JPMorganChase_2Q24_A_26.0 | A | 62 | Yeah. It's a good question. I think the short ... | Positive | Financials | NaN | Positive | Evasive | Evasive | General News | Opinion | Macro | None | Positive | 70.0 | 0.0 | 30.0 | C&I charge-off rate, credit culture, underwrit... | current quarter's results, upward pressure, id... | |
| JPMorganChase_1Q24_Q_30.0 | Q | 97 | Thank you. And I guess, as a tie-in to that qu... | Neutral | M&A | Investments | Not Evasive | Neutral | Evasive | Evasive | NaN | NaN | None | Positive | 60.0 | 20.0 | 20.0 | JPMorgan, Private Credit Growth, Competitive E... | Competition, Business Bidding, Private Credit ... | Context, Question and Answer, Market Analysis |
Time taken for 108 QA: 12.21 minutes Estimate for all transcripts: 1.75 hours Topic analysis on Q&A ground truth table input_aggregated: N 1/108 ... 5/108 ... Keywords found: Basel III 10/108 ... 15/108 ... 20/108 ... 25/108 ... 30/108 ... 35/108 ... 40/108 ... 45/108 ... 50/108 ... 55/108 ... 60/108 ... 65/108 ... 70/108 ... 75/108 ... 80/108 ... 85/108 ... 90/108 ... 95/108 ... 100/108 ... Keywords found: Basel III 105/108 ...
| qa_type | qa_num | qa_text | True Sentiment | True Topic | True Evasion Present | RoBERTa Sentiment | finBERT Evasion Present | finBERT(summarised) Evasion Present | finBERT Topic Classification | finBERT(summarised) Topic Classification | keywords | Phi-3.5 Topic Classification | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uid | |||||||||||||
| JPMorganChase_3Q24_Q_28.0 | Q | 28 | And so Daniel's comments in September were on ... | Neutral | Earnings | Not Evasive | Neutral | Evasive | Evasive | NaN | NaN | None | Markets |
| JPMorganChase_3Q24_A_28.0 | A | 28 | No. Those were core NII or NII ex.. So again, ... | Negative | Earnings | NaN | Negative | Evasive | Evasive | Financials | Financials | None | Financials |
| JPMorganChase_2Q24_Q_26.0 | Q | 62 | Very good. And as a follow-up, you've been ver... | Neutral | Financials | Not Evasive | Neutral | Evasive | Evasive | NaN | NaN | None | Financials |
| JPMorganChase_2Q24_A_26.0 | A | 62 | Yeah. It's a good question. I think the short ... | Positive | Financials | NaN | Positive | Evasive | Evasive | General News | Opinion | Macro | None | Earnings |
| JPMorganChase_1Q24_Q_30.0 | Q | 97 | Thank you. And I guess, as a tie-in to that qu... | Neutral | M&A | Investments | Not Evasive | Neutral | Evasive | Evasive | NaN | NaN | None | General News |
Time taken for 108 QA: 12.19 minutes Estimate for all transcripts: 1.74 hours Sentiment analysis on aggregated QA ground truth table input_aggregated: Y 1/42 ... Keywords found: Basel III 5/42 ... 10/42 ... 15/42 ... 20/42 ... 25/42 ... 30/42 ... 35/42 ... 40/42 ... Keywords found: Basel III
| qa_type | qa_num | qa_text | True Sentiment | True Topic | True Evasion Present | RoBERTa Sentiment | finBERT Evasion Present | finBERT Topic Classification | finBERT(summarised) Evasion Present | finBERT(summarised) Topic Classification | keywords | Phi-3.5 Sentiment | Phi-3.5 Positive % | Phi-3.5 Negative % | Phi-3.5 Neutral % | Phi-3.5 Positive Topics | Phi-3.5 Negative Topics | Phi-3.5 Neutral Topics | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uid | |||||||||||||||||||
| JPMorganChase_3Q24_QA_28.0 | QA | 28 | And so Daniel's comments in September were on ... | Negative | Earnings | Not Evasive | Negative | Evasive | Financials | Evasive | Financials | None | Negative | 10.0 | 80.0 | 10.0 | NII Ex. Markets, Consensus, Decline Year-on-Ye... | Insufficient Decline, Toppy Consensus, Sequent... | Financial Context, Core NII, 2024 Consensus, 2... |
| JPMorganChase_2Q24_QA_26.0 | QA | 62 | Very good. And as a follow-up, you've been ver... | Positive | Financials | Not Evasive | Positive | Evasive | General News | Opinion | Evasive | Macro | None | Positive | 80.0 | 20.0 | 0.0 | C&I portfolio, strong performance, low charge-... | non-accrual loans, potential cracks, upward pr... | context, follow-up, future outlook |
| JPMorganChase_1Q24_QA_30.0 | QA | 97 | Thank you. And I guess, as a tie-in to that qu... | Positive | M&A | Investments | Not Evasive | Neutral | Evasive | General News | Opinion | Evasive | Company | Product News | None | Positive | 80.0 | 20.0 | 0.0 | private credit growth, competitive position, v... | competitive dynamics, concentration risks | context, balance, tension |
| JPMorganChase_1Q24_QA_33.0 | QA | 100 | As a quick follow-up. Where are the next homer... | Negative | General News | Opinion | Not Evasive | Neutral | Evasive | General News | Opinion | Evasive | Company | Product News | Basel III | Positive | 60.0 | 20.0 | 20.0 | capital deployment, earnings in store, well-po... | bank failures, regional bank failures, competi... | bank resolution process, FDIC's attitude, capi... |
| JPMorganChase_4Q23_QA_30.0 | QA | 131 | Just to make sure I understood what you're say... | Negative | Earnings | Evasive | Neutral | Evasive | General News | Opinion | Evasive | Financials | None | Neutral | 20.0 | 30.0 | 50.0 | NII, 2024 consensus, First Republic, Calendari... | NII pull to par, First Republic guidance, Fund... | Market change, First Republic contribution, An... |
Time taken for 42 QA: 4.84 minutes Estimate for all transcripts: 0.72 hours Topic analysis on aggregated QA ground truth table input_aggregated: Y 1/42 ... Keywords found: Basel III 5/42 ... 10/42 ... 15/42 ... 20/42 ... 25/42 ... 30/42 ... 35/42 ... 40/42 ... Keywords found: Basel III
| qa_type | qa_num | qa_text | True Sentiment | True Topic | True Evasion Present | RoBERTa Sentiment | finBERT Evasion Present | finBERT Topic Classification | finBERT(summarised) Evasion Present | finBERT(summarised) Topic Classification | keywords | Phi-3.5 Topic Classification | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uid | |||||||||||||
| JPMorganChase_3Q24_QA_28.0 | QA | 28 | And so Daniel's comments in September were on ... | Negative | Earnings | Not Evasive | Negative | Evasive | Financials | Evasive | Financials | None | Financials |
| JPMorganChase_2Q24_QA_26.0 | QA | 62 | Very good. And as a follow-up, you've been ver... | Positive | Financials | Not Evasive | Positive | Evasive | General News | Opinion | Evasive | Macro | None | Earnings |
| JPMorganChase_1Q24_QA_30.0 | QA | 97 | Thank you. And I guess, as a tie-in to that qu... | Positive | M&A | Investments | Not Evasive | Neutral | Evasive | General News | Opinion | Evasive | Company | Product News | None | General News |
| JPMorganChase_1Q24_QA_33.0 | QA | 100 | As a quick follow-up. Where are the next homer... | Negative | General News | Opinion | Not Evasive | Neutral | Evasive | General News | Opinion | Evasive | Company | Product News | Basel III | General News |
| JPMorganChase_4Q23_QA_30.0 | QA | 131 | Just to make sure I understood what you're say... | Negative | Earnings | Evasive | Neutral | Evasive | General News | Opinion | Evasive | Financials | None | Financials |
Time taken for 42 QA: 4.92 minutes Estimate for all transcripts: 0.73 hours
# Merge unique columns into single aggregated table result
unique_columns_in_topics_agg = [
col for col in ground_truth_df_topics_agg.columns if col not in ground_truth_df_evasion.columns
]
unique_columns_in_sentiment_agg = [
col for col in ground_truth_df_sentiment_agg.columns if col not in ground_truth_df_evasion.columns
]
ground_truth_df_all_results_agg = pd.merge(ground_truth_df_evasion, ground_truth_df_topics_agg[unique_columns_in_topics_agg], left_index=True, right_on="uid", how="outer")
ground_truth_df_all_results_agg = pd.merge(ground_truth_df_all_results_agg, ground_truth_df_sentiment_agg[unique_columns_in_sentiment_agg], left_index=True, right_on="uid", how="outer")
print("Aggregated data results table (all analyses on aggregated QA set):")
display(ground_truth_df_all_results_agg.head())
# Merge unique columns into single non-aggregated table result
# Note that finBERT classification only has results for answers
unique_columns_in_topics = [
col for col in ground_truth_df_topics.columns if col not in ground_truth_df_sentiment.columns
]
ground_truth_df_all_results = pd.merge(ground_truth_df_sentiment, ground_truth_df_topics[unique_columns_in_topics], left_index=True, right_on="uid", how="outer")
print("Sentiment/Topics results table (all analyses on non-aggregated Q&A set):")
display(ground_truth_df_all_results.head())
Aggregated data results table (all analyses on aggregated QA set):
| qa_type | qa_num | qa_text | True Sentiment | True Topic | True Evasion Present | RoBERTa Sentiment | finBERT Evasion Present | finBERT Topic Classification | finBERT(summarised) Evasion Present | ... | Phi-3.5 Evasion Degree | Phi-3.5 Evaded Topics | Phi-3.5 Topic Classification | Phi-3.5 Sentiment | Phi-3.5 Positive % | Phi-3.5 Negative % | Phi-3.5 Neutral % | Phi-3.5 Positive Topics | Phi-3.5 Negative Topics | Phi-3.5 Neutral Topics | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uid | |||||||||||||||||||||
| JPMorganChase_1Q22_QA_20.0 | QA | 408 | That doesnβt sound so bad. Yeah. No. I mean, G... | Positive | Financials | Evasive | Neutral | Not Evasive | General News | Opinion | Evasive | ... | Moderate | 2024 consensus, NII Markets, Asset-Sensitive N... | General News | Positive | 70.0 | 10.0 | 20.0 | no concerns, spot metrics, not seeing anything... | complacent, risks, around the corner | context, Glenn, current situation, metrics, en... |
| JPMorganChase_1Q22_QA_38.0 | QA | 426 | Okay. And then maybe just skipping over to tra... | Neutral | Markets | Evasive | Neutral | Evasive | General News | Opinion | Evasive | ... | Moderate | Trading performance prediction, March quarter ... | Fed | Positive | 60.0 | 0.0 | 40.0 | stronger quarter, trading, QT normalization, M... | context, trading strategy, market expectations... | |
| JPMorganChase_1Q22_QA_9.0 | QA | 397 | Okay. And getting to double digits is over the... | Neutral | Company | Product News | Not Evasive | Neutral | Evasive | General News | Opinion | Not Evasive | ... | Moderate | Timeframe, Pace of growth, Specific growth rat... | Financials | Positive | 70.0 | 0.0 | 30.0 | double digits growth, 7% target, accelerate, I... | context, timeframe, pace, financial update | |
| JPMorganChase_1Q23_QA_18.0 | QA | 235 | In your comments about your CET1 ratio, obviou... | Negative | Fed | Central Banks | Not Evasive | Negative | Evasive | Fed | Central Banks | Evasive | ... | None | N/A | Financials | Positive | 60.0 | 20.0 | 20.0 | CET1 ratio, GSIB buffer, Stress Capital Buffer... | SCB prediction, Industry surprise, Credit shocks | Operating assumptions, Planning purposes, SCB ... |
| JPMorganChase_1Q23_QA_19.0 | QA | 236 | Sure. And then just as a follow-up, if I heard... | Neutral | Financials | Not Evasive | Neutral | Evasive | General News | Opinion | Evasive | ... | Moderate | Commercial Real Estate, Single Name Items, Cor... | Earnings | Positive | 60.0 | 20.0 | 20.0 | loan loss reserve build, one-off credits, Corp... | larger credits, potential risk, single name items | context clarification, follow-up question, fin... |
5 rows Γ 23 columns
Sentiment/Topics results table (all analyses on non-aggregated Q&A set):
| qa_type | qa_num | qa_text | True Sentiment | True Topic | True Evasion Present | RoBERTa Sentiment | finBERT Evasion Present | finBERT(summarised) Evasion Present | finBERT Topic Classification | finBERT(summarised) Topic Classification | keywords | Phi-3.5 Sentiment | Phi-3.5 Positive % | Phi-3.5 Negative % | Phi-3.5 Neutral % | Phi-3.5 Positive Topics | Phi-3.5 Negative Topics | Phi-3.5 Neutral Topics | Phi-3.5 Topic Classification | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uid | ||||||||||||||||||||
| JPMorganChase_1Q22_A_20.0 | A | 408 | Yeah. No. I mean, Glenn, I think, look, no one... | Positive | Financials | NaN | Neutral | Not Evasive | Evasive | General News | Opinion | Macro | None | Positive | 80.0 | 0.0 | 20.0 | no concerns, spot metrics, not seeing anything... | complacent, environment, looking around the co... | General News | |
| JPMorganChase_1Q22_A_38.0 | A | 426 | Yeah. I mean, you know that we're going to be ... | Neutral | Markets | NaN | Neutral | Evasive | Evasive | General News | Opinion | Macro | None | Negative | 0.0 | 100.0 | 0.0 | N/A | Reluctance, Predicting, Trading Performance, U... | Context, Financial Sentiment, Analysis, Foreca... | General News |
| JPMorganChase_1Q22_A_9.0 | A | 397 | I wasn't meaning to put a timeframe on it. But... | Neutral | Company | Product News | NaN | Neutral | Evasive | Not Evasive | General News | Opinion | Company | Product News | None | Positive | 70.0 | 0.0 | 30.0 | Investor Day, five-year plan, strategic growth... | timeframe, update frequency, financial context | General News | |
| JPMorganChase_1Q22_Q_20.0 | Q | 408 | That doesnβt sound so bad. | Neutral | General News | Opinion | Evasive | Neutral | Evasive | Evasive | NaN | NaN | None | Positive | 70.0 | 0.0 | 30.0 | Asset-Sensitive Number, 2024 Consensus, Market... | Financial Analysis, Market Predictions, Econom... | Earnings | |
| JPMorganChase_1Q22_Q_38.0 | Q | 426 | Okay. And then maybe just skipping over to tra... | Neutral | Markets | Evasive | Positive | Evasive | Evasive | NaN | NaN | None | Positive | 70.0 | 10.0 | 20.0 | Trading Strength, QT Normalization, Fed Volati... | Fed Concerns, Market Volatility, QT Impact, In... | Context Clarification, Market Analysis, Tradin... | Fed |
ground_truth_df_all_results.to_csv(output_data_folder + 'phi_ground_truth_results.csv')
ground_truth_df_all_results_agg.to_csv(output_data_folder + 'phi_ground_truth_results_agg.csv')
2.4.3.2 Compare results with other models (Output)ΒΆ
# Assess validity of aggregating RoBERTa and finBERT results (needed for direct comparison with Evasive results later)
print("Non-aggregated answers:")
accuracy_results_gt = calculate_accuracy(ground_truth_df_all_results, sentiment_flag='Y', evasion_flag='N', topic_flag='Y')
Non-aggregated answers: Topic Accuracy Results (Significance Threshold 5.0%): finBERT Topic Classification: 12.04% finBERT(summarised) Topic Classification: 25.93% Phi-3.5 Topic Classification: 25.0% Sentiment Accuracy Results (Significance Threshold 33.33%): RoBERTa Sentiment: 66.67% Phi-3.5 Sentiment: 38.89%
print("Aggregated answer Results:")
accuracy_results_gt_agg = calculate_accuracy(ground_truth_df_all_results_agg, sentiment_flag='Y', evasion_flag='Y', topic_flag='Y')
Aggregated answer Results: Topic Accuracy Results (Significance Threshold 5.0%): finBERT Topic Classification: 9.52% finBERT(summarised) Topic Classification: 28.57% Phi-3.5 Topic Classification: 23.81% Sentiment Accuracy Results (Significance Threshold 33.33%): RoBERTa Sentiment: 61.9% Phi-3.5 Sentiment: 35.71% Evasion Accuracy Results (Significance Threshold 50.0%): finBERT Evasion Present: 40.48% finBERT(summarised) Evasion Present: 50.0% Phi-3.5 Evasion Present: 61.9%
Topic Classification
Aggregating answers improved finBERT performance for summarised texts, with aggregated summarised answer analysis giving the best accuracy (28.6%), (finBERT results were aggregated where there was more than one answer to a question by taking the topic class of the longest answer). The increased accuracy for summarised data likely reflects the removal of irrelavant (non-financial) language from the text by Phi-3.5, and increased accuracy for aggregated answers gives better content for more accurate topic classification.
Phi3.5 performance was slightly worsened by summarisation, which likely reflects its primary usage as a generalist language model performing better with larger context.
Therefore: use finBERT(summarised) with answer aggregation is better than Phi-3.5 for further classification analysis.
Sentiment
The opposite effect (decreased performance) was seen for answer-aggregated sentiment analysis (where the sentiment of the longest answer was taken for the whole answer sentiment, where multiple answers were present). RoBERTA decreased from 66.7% to 61.9%.
Therefore: preferable to use RoBERTa without answer aggregation, but for Negative Evasion analysis (which requires aggregated answers) RoBERTa is still preferable to Phi-3.5.
Question-Evasion
Can only be performed on aggregated QA set by Phi-3.5 - outperforms all other models with 62% accuracy.
Accuracy of evasion analysis was much improved by running the evasion prompt question separately from the sentiment question, but this means proceeding to analysing full transcript sets of Q&As is not viable as running the analysis multiple times for each prompt would negate the proposed time/resource-saving benefits of using Phi as a replacement for other analysis methods.
As Phi is the most accurate question-evasion method tested, we will apply it to analyse Q&As from transcripts covering specific quarters of interest (identified in EDA section).
3. AnalysisΒΆ
3.0 Running model on the full dataset to get responsesΒΆ
3.0.1 RoBERTa sentimentΒΆ
full_data = pd.read_csv(processed_data_folder + "/transcripts_tabular_JPMorgan_clean.csv")
full_data = full_data[full_data['qa_type'].isin(['Q', 'A'])]
roberta_model = ClassificationModel("financial-roberta-large", "soleimanian/financial-roberta-large-sentiment")
full_data = roberta_model.get_model_response_for_df(full_data, 'qa_text')
full_data.to_csv(output_data_folder + "/sentiment_full_result.csv", index=False)
3.0.2 Phi-3.5 by quarterΒΆ
qa_df = pd.read_csv(processed_data_folder + "/transcripts_tabular_JPMorgan_clean.csv")
"""
Interrogate QA pairs from specific quarters from pre-processed table for Sentiment and Question-Evasion Analysis (with associated topics - non-categorical)
This analysis was only performed on the aggregated dataset, in order to analyse Phi-3.5 identified Negative Evasive topics
Note that a more comprehensive approach would analyse the quarters separately using RoBERTa, and then compare these sentiments to Phi-3.5-identified Evasion.
However, for a simple exploration of the capabilities of Phi-3.5, a Phi-3.5-only approach was used.
"""
# Define quarters of interest
quarters_of_interest = ['1Q22', '2Q24', '3Q24']
# Filter the DataFrame for rows where the 'uid' column contains any of the quarters of interest
quarter_filtered_df = qa_df[qa_df['uid'].str.contains('|'.join(quarters_of_interest))]
# Aggregate df for Phi3.5 evasion analysis
agg_quarters_df = aggregate_QA_data(quarter_filtered_df)
Original DataFrame: 167 rows
| uid | bank | year | quarter | date | section | name | title | firm | qa_type | qa_num_within | qa_num | qa_text | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | JPMorganChase_3Q24_Q_1.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | questions_answers | Jim Mitchell | Analyst | Seaport Global Securities LLC | Q | 1.0 | 1.0 | Hey, good morning. So, Jeremy, as you highligh... |
| 2 | JPMorganChase_3Q24_A_1.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | questions_answers | Jeremy Barnum | Chief Financial Officer | JPMorgan Chase & Co. | A | 1.0 | 1.0 | Yeah. Sure, Jim. I'll try to answer both quest... |
| 3 | JPMorganChase_3Q24_Q_3.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | questions_answers | Steven Chubak | Analyst | Wolfe Research LLC | Q | 3.0 | 3.0 | Hi. Good morning. So Jeremy, how are you? So I... |
| 4 | JPMorganChase_3Q24_A_3.0 | JPMorganChase | 2024 | 3 | 2024-10-11 | questions_answers | Jeremy Barnum | Chief Financial Officer | JPMorgan Chase & Co. | A | 3.0 | 3.0 | Sure. So good question and I agree with your n... |
| 5 | JPMorganChase_3Q24_A_3.1 | JPMorganChase | 2024 | 3 | 2024-10-11 | questions_answers | Jamie Dimon | Chairman & Chief Executive Officer | JPMorgan Chase & Co. | A | 3.0 | 3.0 | And can I just give you just a view of expense... |
Aggregated QA DataFrame: 73 rows
| qa_type | qa_num | qa_text | |
|---|---|---|---|
| uid | |||
| JPMorganChase_3Q24_QA_1.0 | QA | 1.0 | Hey, good morning. So, Jeremy, as you highligh... |
| JPMorganChase_3Q24_QA_3.0 | QA | 3.0 | Hi. Good morning. So Jeremy, how are you? So I... |
| JPMorganChase_3Q24_QA_4.0 | QA | 4.0 | Thank you both for the color. Just a quick fol... |
| JPMorganChase_3Q24_QA_6.0 | QA | 6.0 | My first question, and thank you very much for... |
| JPMorganChase_3Q24_QA_7.0 | QA | 7.0 | And if I can ask my second question, and, Jami... |
# Run Phi-3.5 function for Evasion and Sentiment analysis (both create non-categorical topics for each sentiment and evasion)
print(f'Running Evasion analysis on {quarters_of_interest}')
quarter_evasion_df = phi_question_answer(agg_quarters_df, evasion_questions, input_aggregated='Y', input_col='qa_text')
print(f'Running Sentiment analysis on {quarters_of_interest}')
quarter_sentiment_df = phi_question_answer(agg_quarters_df, sentiment_questions, input_aggregated='Y', input_col='qa_text')
Running Evasion analysis on ['1Q22', '2Q24', '3Q24'] input_aggregated: Y 1/73 ... 5/73 ... Keywords found: Basel III 10/73 ... 15/73 ... 20/73 ... Keywords found: PRA, Basel III 25/73 ... 30/73 ... Keywords found: Basel III 35/73 ... Keywords found: Basel III Keywords found: Basel III 40/73 ... 45/73 ... 50/73 ... 55/73 ... 60/73 ... 65/73 ... 70/73 ...
| qa_type | qa_num | qa_text | keywords | Phi-3.5 Evasion Present | Phi-3.5 Evasion Degree | Phi-3.5 Evaded Topics | |
|---|---|---|---|---|---|---|---|
| uid | |||||||
| JPMorganChase_3Q24_QA_1.0 | QA | 1.0 | Hey, good morning. So, Jeremy, as you highligh... | None | Evasive | Moderate | Yield Curve, Deposit Behavior, Pricing, NII Tr... |
| JPMorganChase_3Q24_QA_3.0 | QA | 3.0 | Hi. Good morning. So Jeremy, how are you? So I... | None | Evasive | Moderate | expense forecast, core expense base, annualiza... |
| JPMorganChase_3Q24_QA_4.0 | QA | 4.0 | Thank you both for the color. Just a quick fol... | None | Not Evasive | None | N/A |
| JPMorganChase_3Q24_QA_6.0 | QA | 6.0 | My first question, and thank you very much for... | None | Evasive | Moderate | 2025 consensus, NII Markets, Yield Curve, Empi... |
| JPMorganChase_3Q24_QA_7.0 | QA | 7.0 | And if I can ask my second question, and, Jami... | None | Evasive | Moderate | Capital Deployment, Direct Lending, Innovation... |
Time taken for 73 QA: 8.75 minutes Estimate for all transcripts: 0.75 hours Running Sentiment analysis on ['1Q22', '2Q24', '3Q24'] input_aggregated: Y 1/73 ... 5/73 ... Keywords found: Basel III 10/73 ... 15/73 ... 20/73 ... Keywords found: PRA, Basel III 25/73 ... 30/73 ... Keywords found: Basel III 35/73 ... Keywords found: Basel III Keywords found: Basel III 40/73 ... 45/73 ... 50/73 ... 55/73 ... 60/73 ... 65/73 ... 70/73 ...
| qa_type | qa_num | qa_text | keywords | Phi-3.5 Sentiment | Phi-3.5 Positive % | Phi-3.5 Negative % | Phi-3.5 Neutral % | Phi-3.5 Positive Topics | Phi-3.5 Negative Topics | Phi-3.5 Neutral Topics | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| uid | |||||||||||
| JPMorganChase_3Q24_QA_1.0 | QA | 1.0 | Hey, good morning. So, Jeremy, as you highligh... | None | Negative | 20.0 | 60.0 | 20.0 | NII growth, sequential increase, trough predic... | sequential decline, yield curve, deposit behavior | Fed rate cuts, yield-seeking behavior, QT end |
| JPMorganChase_3Q24_QA_3.0 | QA | 3.0 | Hi. Good morning. So Jeremy, how are you? So I... | None | Positive | 70.0 | 20.0 | 10.0 | Investment Growth, Strategy Execution, Cost Ef... | Expense Forecast, Inflation, Annualization, Ex... | Expense Guidance, Budget Cycle, Investor Day |
| JPMorganChase_3Q24_QA_4.0 | QA | 4.0 | Thank you both for the color. Just a quick fol... | None | Positive | 60.0 | 20.0 | 20.0 | NII, Yield Expansion, Policy Rate, Duration Ma... | Curve Inversion, Extreme Scenarios, Rate Cuts,... | Excess Reserves, Fed, Securities, Forward Curv... |
| JPMorganChase_3Q24_QA_6.0 | QA | 6.0 | My first question, and thank you very much for... | None | Negative | 30.0 | 40.0 | 30.0 | NII, Consensus, Growth Resume | Yield Curve, Fed Cuts, EaR Adjustment | Model, Empirical, Deposit Betas |
| JPMorganChase_3Q24_QA_7.0 | QA | 7.0 | And if I can ask my second question, and, Jami... | None | Positive | 60.0 | 20.0 | 20.0 | capital generation, return on equity, sharehol... | market volatility, asset inflation, cautious a... | quarterly earnings, capital deployment, shareh... |
Time taken for 73 QA: 8.56 minutes Estimate for all transcripts: 0.73 hours
# Merge results back into single dataframe
quarter_sentiment_df = quarter_sentiment_df.drop(columns=['qa_type', 'qa_num', 'qa_text','keywords'])
quarter_df_results = pd.merge(quarter_evasion_df, quarter_sentiment_df, left_index=True, right_index=True, how="outer")
# Extract quarter and year information and add to dataframe
quarter_df_results['quarter_ID'] = quarter_df_results.index.to_series().str.extract(r'_(\dQ\d{2})_')
quarter_df_results['year'] = quarter_df_results['quarter_ID'].str[-2:].apply(lambda x: '20' + x)
display(quarter_df_results.head())
| qa_type | qa_num | qa_text | keywords | Phi-3.5 Evasion Present | Phi-3.5 Evasion Degree | Phi-3.5 Evaded Topics | Phi-3.5 Sentiment | Phi-3.5 Positive % | Phi-3.5 Negative % | Phi-3.5 Neutral % | Phi-3.5 Positive Topics | Phi-3.5 Negative Topics | Phi-3.5 Neutral Topics | quarter_ID | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uid | ||||||||||||||||
| JPMorganChase_1Q22_QA_1.0 | QA | 389 | Thank you. Morning, Jeremy. Was wondering abou... | NaN | Evasive | Moderate | Net Interest Income Expectations, Fed Hike Imp... | Positive | 60 | 40 | 0 | Net Interest Income, Excess, Increase, Investo... | Rate Expectations, Fed Hike, Front End of the ... | NaN | 1Q22 | 2022 |
| JPMorganChase_1Q22_QA_10.0 | QA | 398 | Okay. And then just the follow up here is on t... | NaN | Evasive | Moderate | NII Outlook, Reinvestment Strategy, Investment... | Positive | 70 | 10 | 20 | NII outlook, Investor Day, Confidence, 17% ROT... | Rate Environment, Short-term Headwinds, Fed's ... | Reinvestment, Historical Practice, Future Spen... | 1Q22 | 2022 |
| JPMorganChase_1Q22_QA_12.0 | QA | 400 | Hey. Good morning. So wanted to start off with... | NaN | Evasive | Moderate | Fed balance sheet reduction, Deposit outflow e... | Neutral | 10 | 40 | 50 | Fed policy, Deposit outflow, Industry perspect... | Fed balance sheet reduction, Deposit outflow r... | Financial context, Question, Fed outlining, Ex... | 1Q22 | 2022 |
| JPMorganChase_1Q22_QA_13.0 | QA | 401 | Hey, Steve. So this is a fun question. So let'... | NaN | Evasive | Moderate | company's strategy, impact of QT, effect on ba... | Positive | 60 | 20 | 20 | NII growth, robust loan growth, share wins, pi... | QT headwind, bill maturity decisions, short-da... | QE cycle, RRP dynamics, Fed minutes, system-wi... | 1Q22 | 2022 |
| JPMorganChase_1Q22_QA_14.0 | QA | 402 | No, that's really helpful color. Thanks for al... | NaN | Not Evasive | NaN | NaN | Positive | 60 | 30 | 10 | Fed hikes, Market Underestimation, Excess Liqu... | Inflation Concerns, Market Volatility, QT Impa... | Shareholder Letter, Investment Portfolio, Cust... | 1Q22 | 2022 |
quarter_df_results.to_csv(output_data_folder + "/phi_quarter_results.csv")
3.1 22Q1ΒΆ
We run a small number of analysis on 22Q1 to show proof of concepts. More detailed analysis on performed on 24Q2 and 24Q3 to identify emerging risks.
3.1.1 SentimentΒΆ
full_result = pd.read_csv(output_data_folder + "/sentiment_full_result.csv")
full_result['date'] = pd.to_datetime(full_result['date'])
score_by_quarter = full_result.groupby('date')['financial-roberta-large_score'].mean()
year_quarter = full_result[['date', 'year', 'quarter']].drop_duplicates()
score_by_quarter = pd.merge(score_by_quarter, year_quarter, on='date', how='left')
score_by_quarter['year_quarter'] = score_by_quarter['quarter'].astype(str) + 'Q' + (score_by_quarter['year']-2000).astype(str)
score_by_quarter.set_index('year_quarter', inplace=True)
fig, ax1 = plt.subplots(figsize=(10,3))
sns.lineplot(x=score_by_quarter.index, y=score_by_quarter['financial-roberta-large_score'], ax=ax1,
color='orange', lw=2, label='average sentiment', legend=False)
xticks = [f"{row['quarter']}Q{row['year']-2000}" for _, row in score_by_quarter.iterrows()]
plt.xticks(ticks=score_by_quarter.index, labels=xticks, rotation=25)
ax1.set_xlabel('')
ax1.set_ylabel('Average sentiment score', fontsize=14)
ax1.set_ylim(-0.3, 0.3)
for label in ax1.get_yticklabels():
label.set_size(fontsize=14)
plt.axhline(y=0, lw=0.8, ls=(0, (5,10)), color='black')
ax1.axvspan(2,4, facecolor='orange', alpha=0.2, edgecolor=None)
ax1.set_title("Average sentiment score over time")
for label in ax1.get_xticklabels():
label.set_size(fontsize=12)
fig.legend(fontsize=14, loc='center left', bbox_to_anchor=(0.9, 0.5))
quarter_count = full_result.groupby(['year', 'quarter'])['financial-roberta-large_sentiment'].value_counts(normalize=True).reset_index()
quarter_count = quarter_count.pivot(index=['year', 'quarter'], columns='financial-roberta-large_sentiment', values='proportion').reset_index()
fig, ax1 = plt.subplots(figsize=(10,3))
sns.lineplot(x=quarter_count['quarter'].astype(str)+'Q'+(quarter_count['year']-2000).astype(str),
y=quarter_count['negative'],
label='negative',lw=2,
color='lightblue',
legend=False)
sns.lineplot(x=quarter_count['quarter'].astype(str)+'Q'+(quarter_count['year']-2000).astype(str),
y=quarter_count['positive'],
label='positive',
color='lightcoral',
ls='dashed', lw=2,
legend=False)
sns.lineplot(x=quarter_count['quarter'].astype(str)+'Q'+(quarter_count['year']-2000).astype(str),
y=quarter_count['neutral'],
label='neutral',
color='grey',lw=2,
ls='dashdot',
legend=False)
plt.xticks(rotation=25)
plt.xlabel('')
plt.ylabel('Proportion of sentiment', fontsize=14)
for label in ax1.get_yticklabels():
label.set_size(fontsize=14)
ax1.axvspan(2,4, facecolor='orange', alpha=0.2, edgecolor=None)
for label in ax1.get_xticklabels():
label.set_size(fontsize=12)
fig.legend(fontsize=14, loc='center left', bbox_to_anchor=(0.9, 0.5))
3.1.2 Topic modelling on negative-sentiment textsΒΆ
3.1.2.1 BERTopicΒΆ
# Read the sentiment and summarised data
sentiment_df = pd.read_csv(output_data_folder + "/sentiment_full_result.csv")
df_phi_fulltable_summarised = pd.read_excel(processed_data_folder + '/phi_fulltable_summarised.xlsx')
# subset relevant columns from sentiment file
sentiment_colname = 'financial-roberta-large_sentiment'
sentiment_df_22Q1= sentiment_df[(sentiment_df.year == 2022) & (sentiment_df.quarter == 1)]
sentiment_df_22Q1 = sentiment_df_22Q1[['uid', sentiment_colname]]
# Subset relevant columns from summarised text file
df_phi_fulltable_summarised_2022Q1 = df_phi_fulltable_summarised[(df_phi_fulltable_summarised.year == 2022) &
(df_phi_fulltable_summarised.quarter == 1)]
summarised_22Q1_df = df_phi_fulltable_summarised_2022Q1[['uid', 'summarised_text']]
# Further filter sentiment to only include negative sentiments
sentiment_df_22Q1 = sentiment_df_22Q1[sentiment_df_22Q1[sentiment_colname] == 'negative'][['uid', sentiment_colname]]
# Merge neagtive data with ground truth data
df_sentiment_22Q1_summarised = pd.merge(sentiment_df_22Q1,summarised_22Q1_df , on=['uid'], how='inner')
df_phi_2022Q1_neg_list = df_sentiment_22Q1_summarised['summarised_text'].apply(preprocess_spacy).to_list()
dim_model = PCA(n_components=11)
cluster_model = KMeans(n_clusters=11)
topic_model_phi_2022Q1_neg = BERTopic(embedding_model=embedding_model,
hdbscan_model=cluster_model, calculate_probabilities=True)
topics, probabilities = topic_model_phi_2022Q1_neg.fit_transform(df_phi_2022Q1_neg_list)
# Reduce topics with higher diversity
topic_model_phi_2022Q1_neg = topic_model_phi_2022Q1_neg.reduce_topics(df_phi_2022Q1_neg_list, nr_topics=4)
# Plot each BERTtopic visualization into a subplot
topic_model_phi_2022Q1_neg.visualize_topics().write_html("topic_model_phi_2022Q1_neg_topic.html")
topic_model_phi_2022Q1_neg.visualize_barchart(n_words=10, autoscale=True).write_html("topic_model_phi_2022Q1_neg_topic_barchart.html")
topic_model_phi_2022Q1_neg.visualize_heatmap().write_html("topic_model_phi_2022Q1_neg_topic_heatmap.html")
topic_model_phi_2022Q1_neg.visualize_hierarchy().write_html("topic_model_phi_2022Q1_neg_topic_hierarchy.html")
# Load each plot into a subplot
#display(HTML("topic_model_phi_2022Q1_neg_topic.html"))
display(HTML("topic_model_phi_2022Q1_neg_topic_barchart.html"))
display(HTML("topic_model_phi_2022Q1_neg_topic_heatmap.html"))
display(HTML("topic_model_phi_2022Q1_neg_topic_hierarchy.html"))
plt.tight_layout()
plt.show()
3.1.2.2 FinBERTΒΆ
3.1.2.2.0 FunctionsΒΆ
# Function to standardise formatting for the particular analysts: by omitting middle names and using full names.
def standardise_analyst_names(df, bank):
if bank=='JPMorgan':
analyst_dict = {
"Charles W. Peabody": "Charles Peabody",
"Ebrahim H. Poonawala": "Ebrahim Poonawala",
"Jim Mitchell": "James Mitchell",
"John E. McDonald": "John McDonald",
"Kenneth M. Usdin": "Kenneth Usdin",
"Ken Usdin": "Kenneth Usdin",
"Matt OβConnor": "Matt O'Connor"
}
return df['name'].replace(analyst_dict)
# Plotting the topic probability distributions for a specific condition (e.g. analyst name, quarter, etc.)
def plot_topics_by_condition(condition_dict, finbert_folder, data_folder, label_dict,
chunking=True, max_length=512, bank='JPMorgan',
datatype='all', appdx="", summarised=False,
synthetic=False, sentiment=None, sentiment_folder=None,
save=False):
"""
condition_dict should be a dictionary of len=1
the key should be a column name, the value should be a value in that column
"""
df_merged, _, metadata = get_merged_data_for_plotting(finbert_folder, data_folder,
label_dict, chunking=chunking, appdx=appdx,
max_length=max_length, bank=bank, datatype=datatype,
summarised=summarised, synthetic=synthetic,
sentiment=sentiment, sentiment_folder=sentiment_folder)
key = list(condition_dict.keys())[0]
# standardise analyst names (see 5.1 below - adding it here to reuse this function later)
if key=='name':
df_merged['name'] = standardise_analyst_names(df_merged, bank)
if type(condition_dict[key]) != list:
condition_dict[key] = [condition_dict[key]]
df_qrt = df_merged[df_merged[key].isin(condition_dict[key])].copy()
topic_labels = list(label_dict.keys())
cmap = sns.color_palette("tab20")
prob_cols = [f"topic_{i}_prob" for i in label_dict.keys()]
# order topic labels by their prevalence
if sentiment is None or sentiment=='negative':
topic_labels = df_qrt[prob_cols].median(axis=0).sort_values(ascending=False).index
else: # have the same order as in negative sentiment texts
df_merged_neg, _, _ = get_merged_data_for_plotting(finbert_folder, data_folder,
label_dict, chunking=chunking, appdx=appdx,
max_length=max_length, bank=bank, datatype=datatype,
summarised=summarised, synthetic=synthetic,
sentiment='negative', sentiment_folder=sentiment_folder)
if key=='name':
df_merged_neg['name'] = standardise_analyst_names(df_merged_neg, bank)
df_qrt_neg = df_merged_neg[df_merged_neg[key].isin(condition_dict[key])].copy()
topic_labels = df_qrt_neg[prob_cols].median(axis=0).sort_values(ascending=False).index
fig, ax = plt.subplots(1,1, figsize=(5,5), sharex=True)
ax.vlines(0.05, 0, len(topic_labels), ls='dashed', color='grey', lw=0.5)
for num, topic_label in enumerate(topic_labels):
i = int(topic_label.split("_")[1])
# subset the df to keep only one topic
cols_to_keep = ["finbert_topic_id", topic_label]
df_topic = df_qrt[cols_to_keep].copy()
# plot
sns.boxplot(x=df_topic[topic_label],
y=[num]*df_topic.shape[0],
orient='h',
showfliers=True,
color=cmap[i],
flierprops=dict(marker='o', markerfacecolor=cmap[i], markersize=6, alpha=0.7)
)
fsize=14
ax.set_yticks(np.arange(len(topic_labels)),
labels=[label_dict[int(i.split("_")[1])] for i in topic_labels],
fontsize=fsize)
ax.set_xlabel("Topic probability", fontsize=fsize)
ax.tick_params(axis='x', labelsize=fsize)
# axes2.set_yticklabels(topics, rotation=0, fontsize=fsize)
if save:
fig.savefig(f"topic_boxplots_{key}_{condition_dict.values()}.png",
dpi=300, bbox_inches="tight")
3.1.2.2.1 Plotting probability distributionsΒΆ
We will start by plotting the topic probability distribution of negative texts in the quarter of interest.
plot_topics_by_condition(condition_dict={"quarter_str": ["1Q22"]},
finbert_folder=output_data_folder,
data_folder=output_data_folder,
label_dict=id2label,
max_length=512,
chunking=False,
bank='JPMorgan',
datatype='QA',
sentiment='negative',
appdx='_summarised',
sentiment_folder=output_data_folder,
save=True)
Topics with the highest probabilities in negative sentiment texts are Macro, General News | Opinion, and Fed | Central Banks. However, theseare fairly frequently assigned in general and might not reflect an effect specific to negative sentiments.
3.1.3 EvasionΒΆ
# Filter Dataframe for quarter 1Q22
filtered_df_1Q22 = quarter_df_results[(quarter_df_results['quarter_ID'] == '1Q22')]
# Plot a pie chart
def plot_pie_chart(data, title):
results_series = data.squeeze()
results_counts = results_series.value_counts()
plt.figure(figsize=(5, 5))
plt.pie(results_counts, labels=results_counts.index, autopct='%1.1f%%', startangle=90, colors=['lightcoral', 'lightblue'])
plt.title(f"Evasive vs Not-Evasive answers in {title}")
plt.show()
# Plot a pie chart
plot_pie_chart(filtered_df_1Q22['Phi-3.5 Evasion Present'], '1Q22')
3.1.3.1 Topic modelling on evasive textsΒΆ
3.1.3.1.1 BERTopicΒΆ
def extract_keywords(model):
topic_words = set()
for topic_num in range(len(model.get_topics())):
# Extract the words for each topic
words = [word for word, _ in model.get_topic(topic_num)]
topic_words.update(words)
# Print topic number, words, and the number of words
print(f"Topic {topic_num}: {words} (Count: {len(words)})")
return topic_words
# subset relevant columns from combined evasion file
qevasion_colname = 'Phi-3.5 Evasion Present'
qevasion_df = filtered_df_1Q22[['qa_num', qevasion_colname]]
# subset relevant columns from summarised text file
fulltable_summarised_df = df_phi_fulltable_summarised[['uid', 'qa_num', 'summarised_text']]
qevasion_status='Evasive'
df_qevasion_summarised = pd.merge(fulltable_summarised_df, qevasion_df, on=['qa_num'], how='left')
# subset relevant columns from sentiment file
qevasion_colname = 'Phi-3.5 Evasion Present'
qevasion_df = qevasion_df[['qa_num', qevasion_colname]]
# subset relevant columns from full summarised text file
fulltable_summarised_df = fulltable_summarised_df[['uid', 'qa_num', 'summarised_text']]
# merge two dataset
qevasion_status='Evasive'
df_qevasion_summarised = pd.merge(fulltable_summarised_df, qevasion_df, on=['qa_num'], how='left')
df_qevasion_summarised['quarter_str'] = [x.split("_")[1] for x in df_qevasion_summarised['uid']]
df_evaded_summarised = df_qevasion_summarised[df_qevasion_summarised[qevasion_colname].eq(qevasion_status)].copy().reset_index(drop=True)
quarter_str_status_22 = '1Q22'
quarter_colname = 'quarter_str'
df_evaded_summarised_22Q1 = df_evaded_summarised[df_evaded_summarised[quarter_colname].eq(quarter_str_status_22)].copy().reset_index(drop=True)
df_evaded_summarised_22Q1_list = df_evaded_summarised_22Q1['summarised_text'].apply(preprocess_spacy).to_list()
dim_model = PCA(n_components=5)
cluster_model = KMeans(n_clusters=20)
topic_evaded_summarised_22Q1 = BERTopic(umap_model=dim_model, embedding_model=embedding_model,
hdbscan_model=cluster_model, calculate_probabilities=True)
topics, probs = topic_evaded_summarised_22Q1.fit_transform(df_evaded_summarised_22Q1_list)
topic_evaded_summarised_22Q1 = topic_evaded_summarised_22Q1.reduce_topics(df_evaded_summarised_22Q1_list, nr_topics=5)
# Plot each BERTtopic visualization into a subplot
topic_evaded_summarised_22Q1.visualize_topics().write_html("topic_evaded_summarised_22Q1_topic.html")
topic_evaded_summarised_22Q1.visualize_barchart(top_n_topics=20,n_words=8, autoscale=True).write_html("topic_evaded_summarised_22Q1_barchart.html")
topic_evaded_summarised_22Q1.visualize_heatmap().write_html("topic_evaded_summarised_22Q1_heatmap.html")
topic_evaded_summarised_22Q1.visualize_hierarchy().write_html("topic_evaded_summarised_22Q1_hierarchy.html")
# Load each plot into a subplot
#display(HTML("topic_evaded_summarised_22Q1_topic.html"))
display(HTML("topic_evaded_summarised_22Q1_barchart.html"))
display(HTML("topic_evaded_summarised_22Q1_heatmap.html"))
display(HTML("topic_evaded_summarised_22Q1_hierarchy.html"))
plt.tight_layout()
plt.show()
3.1.4 Evasion + NegativityΒΆ
# Extract keywords from each model
keywords_evaded_22Q1 = extract_keywords(topic_evaded_summarised_22Q1)
keywords_phi_22Q1neg = extract_keywords(topic_model_phi_2022Q1_neg)
# Create a Venn diagram
venn_22 = venn2(
[keywords_evaded_22Q1, keywords_phi_22Q1neg],
('Evaded', 'Negative')
)
# Set colors
venn_22.get_patch_by_id('10').set_color('#FFBF00') # Evaded only
venn_22.get_patch_by_id('01').set_color('#00BFFF') # Negative only
# Add a title
plt.title("BERTopic Word Frequencies")
plt.show()
# Inspect overlaps
intersection_evaded_phi_22Q1neg = keywords_evaded_22Q1 & keywords_phi_22Q1neg
print("Overlap between Evaded and Phi Negative:", intersection_evaded_phi_22Q1neg)
Overlap between Evaded and Phi Negative: {'inflation', 'financial', 'price', 'capital', 'excessive', 'ukraine', 'performance', 'criticizes', 'growth', 'war', 'indicating', 'discusses', 'detracts', 'discussion', 'viewing'}
3.2 Recent two quartersΒΆ
By looking at the most two recent quarters, we can identify any emerging risks.
3.2.0 Initial data explorationΒΆ
# Getting the unique metric types
unique_metric_types = metrics_df['metric_type'].unique()
# Defining colors for each plot
colors = ['blue', 'green', 'red', 'purple']
# Plotting in a 2x2 grid
fig, axes = plt.subplots(2, 2, figsize=(18, 9))
# Flattening axes array for easy iteration
axes = axes.flatten()
# Looping through metric types and corresponding axes
for i, metric in enumerate(unique_metric_types):
subset = metrics_df[metrics_df['metric_type'] == metric]
axes[i].plot(subset['Q&FY'], subset['metric_value'], marker='o', color=colors[i])
axes[i].set_title(f'Metric: {metric}', fontsize=16, color=colors[i])
axes[i].set_xlabel('Quarter', fontsize=14)
axes[i].set_ylabel('Metric Value', fontsize=14)
axes[i].tick_params(axis='x', labelsize=12, rotation=45)
axes[i].tick_params(axis='y', labelsize=12)
axes[i].grid(visible=True, linestyle='--', alpha=0.5)
# Adjusting the layout
plt.tight_layout()
# Showing the plots
plt.show()
Over the last two quarters, the bank has maintained a high CET1 ratio, indicating a strong capital position. However, net income and EPS spiked in 2Q24 before dropping in 3Q24, signaling earnings volatility that could impact profitability. Additionally, elevated provisions for credit losses suggest caution around potential credit risks. Overall, while capital strength is reassuring, addressing credit risk and earnings instability will be crucial for sustaining financial health and investor confidence.
# Creating a dataframs for the 2Q24 and 3Q24 Q&As
Q23_FY24 = transcripts_df[(transcripts_df['date'] >= '2024-07-01') & (transcripts_df['date'] < '2024-11-01')]
# Pulling all words from qa_text_processed column into a list
Q23_FY24_all_words = [word for tokens in Q23_FY24['qa_text_processed'] for word in tokens]
# Calculating the frequency distribution of the words from the dataset
Q23_FY24_freq_dist = FreqDist(Q23_FY24_all_words)
# Getting the top 10 words and their frequencies from the 2Q24 and 3Q24 Q&As
Q23_FY24_top_10 = Q23_FY24_freq_dist.most_common(10)
Q23_FY24_words, Q23_FY24_counts = zip(*Q23_FY24_top_10)
# Setting up the plot
fig, ax1 = plt.subplots(figsize=(14, 6))
# Plotting the top 10 words in a barplot for the 2Q24 and 3Q24 Q&As
sns.barplot(x=list(Q23_FY24_words), y=list(Q23_FY24_counts), palette="viridis", ax=ax1)
ax1.set_title('Top 10 Words in the 2Q24 and 3Q24 Q&As')
ax1.set_xlabel('Words')
ax1.set_ylabel('Frequency')
ax1.set_xticklabels(ax1.get_xticklabels(), rotation=45)
# Adjusting the layout
plt.tight_layout()
# Showing the plots
plt.show()
# Creating a dataframe excluding 2Q24 and 3Q24 Q&As
not_Q23_FY24 = transcripts_df[(transcripts_df['date'] < '2024-07-01') | (transcripts_df['date'] >= '2024-11-01')]
# Getting unique words for the 2Q24 and 3Q24 Q&As
Q23_FY24_unique_words = set(Q23_FY24_freq_dist.keys()) - set(not_Q23_FY24['qa_text_processed'].explode())
Q23_FY24_unique_counts = {word: Q23_FY24_freq_dist[word] for word in Q23_FY24_unique_words}
Q23_FY24_unique_df = pd.DataFrame(Q23_FY24_unique_counts.items(), columns=['word', 'Q23_FY24_frequency'])
# Getting the top 10 unique words for the 2Q24 and 3Q24 Q&As
Q23_FY24_top_words = Q23_FY24_unique_df.nlargest(10, 'Q23_FY24_frequency')
# Setting up the plot
fig, ax1 = plt.subplots(figsize=(14, 6))
# Creating a bar plot for the top 10 2Q24 and 3Q24 exclusive words
sns.barplot(data=Q23_FY24_top_words, x='word', y='Q23_FY24_frequency', palette='Blues')
plt.title('Top 10 Unique Words Used in the 2Q24 and 3Q24 Q&As')
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.xticks(rotation=45)
# Adjusting the layout
plt.tight_layout()
# Showing the plots
plt.show()
Words like βspikeβ and βtroughβ could reflect the recent fluctuations in net income and EPS.
# Calculating word frequencies and total word count in 2Q24 and 3Q24 Q&As
Q23_FY24_freq = transcripts_df[(transcripts_df['date'] >= '2024-07-01') & (transcripts_df['date'] < '2024-11-01')]
Q23_FY24_word_counts = Q23_FY24_freq['qa_text_processed'].explode().value_counts()
Q23_FY24_total_words = Q23_FY24_word_counts.sum() # Total word count for 2Q24 & 3Q24
# Calculating word frequencies and total word count in other quarters
not_Q23_FY24_word_counts = not_Q23_FY24['qa_text_processed'].explode().value_counts()
not_Q23_FY24_total_words = not_Q23_FY24_word_counts.sum() # Total word count for other quarters
# Creating a DataFrame comparing relative frequencies
word_comparison_df = pd.DataFrame({
'Q23_FY24_proportion': Q23_FY24_word_counts / Q23_FY24_total_words,
'other_quarters_proportion': not_Q23_FY24_word_counts / not_Q23_FY24_total_words
}).fillna(0)
# Adding a column for proportion difference
word_comparison_df['proportion_difference'] = word_comparison_df['Q23_FY24_proportion'] - word_comparison_df['other_quarters_proportion']
# Filtering for words that are relatively more frequent in 2Q24 and 3Q24
higher_in_Q23_FY24 = word_comparison_df[word_comparison_df['proportion_difference'] > 0]
# Selecting the top 10 words with the highest proportion difference
top_higher_words = higher_in_Q23_FY24.nlargest(10, 'proportion_difference').reset_index().rename(columns={'qa_text_processed': 'word'})
# Plotting the top words by proportional difference
fig, ax = plt.subplots(figsize=(14, 6))
sns.barplot(data=top_higher_words, x='word', y='proportion_difference', palette='Purples')
plt.title('Top 10 Words with Higher Proportion in 2Q24 and 3Q24 Compared to Other Quarters')
plt.xlabel('Words')
plt.ylabel('Proportional Difference')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
A higher usage of "capital" could relate to how the bank has maintained a high CET1 capital ratio, indicating a strong capital position.
A higher usage of "nii" (net interest income) could relate to the recent fluctuations in net income and EPS.
3.2.1 SentimentΒΆ
full_result = pd.read_csv(output_data_folder + "/sentiment_full_result.csv")
full_result['date'] = pd.to_datetime(full_result['date'])
score_by_quarter = full_result.groupby('date')['financial-roberta-large_score'].mean()
year_quarter = full_result[['date', 'year', 'quarter']].drop_duplicates()
score_by_quarter = pd.merge(score_by_quarter, year_quarter, on='date', how='left')
score_by_quarter['year_quarter'] = score_by_quarter['quarter'].astype(str) + 'Q' + (score_by_quarter['year']-2000).astype(str)
score_by_quarter.set_index('year_quarter', inplace=True)
fig, ax1 = plt.subplots(figsize=(10,3))
sns.lineplot(x=score_by_quarter.index, y=score_by_quarter['financial-roberta-large_score'], ax=ax1,
color='orange', lw=2, label='average sentiment', legend=False)
xticks = [f"{row['quarter']}Q{row['year']-2000}" for _, row in score_by_quarter.iterrows()]
plt.xticks(ticks=score_by_quarter.index, labels=xticks, rotation=25)
ax1.set_xlabel('')
ax1.set_ylabel('Average sentiment score', fontsize=14)
ax1.set_ylim(-0.3, 0.3)
for label in ax1.get_yticklabels():
label.set_size(fontsize=14)
plt.axhline(y=0, lw=0.8, ls=(0, (5,10)), color='black')
ax1.axvspan(12,13, facecolor='orange', alpha=0.2, edgecolor=None)
ax1.set_title("Average sentiment score over time")
for label in ax1.get_xticklabels():
label.set_size(fontsize=12)
fig.legend(fontsize=14, loc='center left', bbox_to_anchor=(0.9, 0.5))
quarter_count = full_result.groupby(['year', 'quarter'])['financial-roberta-large_sentiment'].value_counts(normalize=True).reset_index()
quarter_count = quarter_count.pivot(index=['year', 'quarter'], columns='financial-roberta-large_sentiment', values='proportion').reset_index()
fig, ax1 = plt.subplots(figsize=(10,3))
sns.lineplot(x=quarter_count['quarter'].astype(str)+'Q'+(quarter_count['year']-2000).astype(str),
y=quarter_count['negative'],
label='negative',lw=2,
color='lightblue',
legend=False)
sns.lineplot(x=quarter_count['quarter'].astype(str)+'Q'+(quarter_count['year']-2000).astype(str),
y=quarter_count['positive'],
label='positive',
color='lightcoral',
ls='dashed', lw=2,
legend=False)
sns.lineplot(x=quarter_count['quarter'].astype(str)+'Q'+(quarter_count['year']-2000).astype(str),
y=quarter_count['neutral'],
label='neutral',
color='grey',lw=2,
ls='dashdot',
legend=False)
plt.xticks(rotation=25)
plt.xlabel('')
plt.ylabel('Proportion of sentiment', fontsize=14)
for label in ax1.get_yticklabels():
label.set_size(fontsize=14)
ax1.axvspan(12,13, facecolor='orange', alpha=0.2, edgecolor=None)
for label in ax1.get_xticklabels():
label.set_size(fontsize=12)
fig.legend(fontsize=14, loc='center left', bbox_to_anchor=(0.9, 0.5))
Here we looked at the evolution of sentiment over time with the most recent 2 quarters highlighted.
We can see that the average sentiment decreases. However it is not because negative sentiment has increased but because neutral sentiment has become more prevalent at the expense of positive.
recent_result = full_result[(full_result['year']==2024) & full_result['quarter'].isin([3,2])]
recent_count = recent_result['financial-roberta-large_sentiment'].value_counts()
plt.pie(recent_count, labels=recent_count.index, autopct='%1.1f%%', startangle=90, colors=['grey', 'lightcoral', 'lightblue'])
plt.title('Question and answer sentiment from 2024Q2 and 2024Q3')
plt.show()
This is the sentiment split in the highlighted quarters. Mostly neutral with a fairly equal split of +ve and -ve
It is worth of exploring more of the negative texts since we are interested in emerging risks.
3.2.2 Topic modelling on negative-sentiment textsΒΆ
3.2.2.1 BERTopicΒΆ
sentiment_df = pd.read_csv(output_data_folder + "/sentiment_full_result.csv")
df_phi_fulltable_summarised = pd.read_excel(processed_data_folder + '/phi_fulltable_summarised.xlsx')
# subset relevant columns from sentiment file
sentiment_df_24Q2Q3= sentiment_df[(sentiment_df.year.isin([2024])) & (sentiment_df.quarter.isin([2, 3]))]
sentiment_colname = 'financial-roberta-large_sentiment'
sentiment_df_24Q2Q3 = sentiment_df_24Q2Q3[['uid', sentiment_colname]]
# subset relevant columns from summarised text file
summarised_24Q2Q3_df = df_phi_fulltable_summarised_2024Q2_Q3[['uid', 'summarised_text']]
sentiment_df_24Q2Q3 = sentiment_df_24Q2Q3[sentiment_df_24Q2Q3['financial-roberta-large_sentiment'] == 'negative'][['uid', 'financial-roberta-large_sentiment']]
# merge neagtive data with ground truth data
df_sentiment_24Q2Q3_summarised = pd.merge(sentiment_df_24Q2Q3,summarised_24Q2Q3_df , on=['uid'], how='inner')
df_phi_2024Q2Q3_neg_list = df_sentiment_24Q2Q3_summarised['summarised_text'].apply(preprocess_spacy).to_list()
dim_model = PCA(n_components=18)
cluster_model = KMeans(n_clusters=18)
topic_model_phi_2024Q2Q3_neg = BERTopic( embedding_model=embedding_model,
hdbscan_model=cluster_model, calculate_probabilities=True)
topics, probabilities = topic_model_phi_2024Q2Q3_neg.fit_transform(df_phi_2024Q2Q3_neg_list)
# Reduce topics with higher diversity
topic_model_phi_2024Q2Q3_neg = topic_model_phi_2024Q2Q3_neg.reduce_topics(df_phi_2024Q2Q3_neg_list, nr_topics=5)
# Plot each BERTtopic visualization into a subplot
topic_model_phi_2024Q2Q3_neg.visualize_topics().write_html("topic_model_phi_2024Q2Q3_neg_topic.html")
topic_model_phi_2024Q2Q3_neg.visualize_barchart(n_words=8, autoscale=True).write_html("topic_model_phi_2024Q2Q3_neg_barchart.html")
topic_model_phi_2024Q2Q3_neg.visualize_heatmap().write_html("topic_model_phi_2024Q2Q3_neg_heatmap.html")
topic_model_phi_2024Q2Q3_neg.visualize_hierarchy().write_html("topic_model_phi_2024Q2Q3_neg_hierarchy.html")
# Load each plot into a subplot
#display(HTML("topic_model_phi_2024Q2Q3_neg_topic.html"))
display(HTML("topic_model_phi_2024Q2Q3_neg_barchart.html"))
display(HTML("topic_model_phi_2024Q2Q3_neg_heatmap.html"))
display(HTML("topic_model_phi_2024Q2Q3_neg_hierarchy.html"))
plt.tight_layout()
plt.show()
3.2.2.2 FinBERTΒΆ
Topic distribution in negative sentiment texts.
plot_topics_by_condition(condition_dict={"quarter_str": ["2Q24"]},
finbert_folder=output_folder,
data_folder=data_folder,
label_dict=id2label,
max_length=512,
chunking=False,
bank='JPMorgan',
datatype='QA',
sentiment='negative',
appdx='_summarised',
sentiment_folder=sentiment_folder)
Top three topics in negative texts:
- Fed | Central Banks
- Macro
- Financials
3.2.3 EvasionΒΆ
filtered_df_last2Q = quarter_df_results[(quarter_df_results['quarter_ID'] == '2Q24') | (quarter_df_results['quarter_ID'] == '3Q24')]
# Plot a pie chart
plot_pie_chart(filtered_df_last2Q['Phi-3.5 Evasion Present'], '2Q24/3Q24')
Over 80% of answers were classed as evasive.
Note that Phi-3.5 does overestimate Evasiveness in answers.
3.2.3.1 Topic modelling on evasive textsΒΆ
3.2.3.1.1 BERTopicΒΆ
df_phi_fulltable_summarised = pd.read_excel(processed_data_folder + '/phi_fulltable_summarised.xlsx')
# subset relevant columns from sentiment file
qevasion_colname = 'Phi-3.5 Evasion Present'
qevasion_df = filtered_df_last2Q[['qa_num', qevasion_colname]]
# subset relevant columns from summarised text file
fulltable_summarised_df = df_phi_fulltable_summarised[['uid', 'qa_num', 'summarised_text']]
qevasion_status='Evasive'
df_qevasion_summarised = pd.merge(fulltable_summarised_df, qevasion_df, on=['qa_num'], how='left')
# subset relevant columns from sentiment file
qevasion_colname = 'Phi-3.5 Evasion Present'
qevasion_df = qevasion_df[['qa_num', qevasion_colname]]
# subset relevant columns from summarised text file
fulltable_summarised_df = fulltable_summarised_df[['uid', 'qa_num', 'summarised_text']]
# merge two dataset
qevasion_status='Evasive'
df_qevasion_summarised = pd.merge(fulltable_summarised_df, qevasion_df, on=['qa_num'], how='left')
df_qevasion_summarised['quarter_str'] = [x.split("_")[1] for x in df_qevasion_summarised['uid']]
df_evaded_summarised = df_qevasion_summarised[df_qevasion_summarised[qevasion_colname].eq(qevasion_status)].copy().reset_index(drop=True)
quarter_str_status_24 = ('2Q24','3Q24')
quarter_colname = 'quarter_str'
df_evaded_summarised_24Q2Q3 = df_evaded_summarised[df_evaded_summarised[quarter_colname] != (quarter_str_status_22)]
df_evaded_summarised_24Q2Q3_list = df_evaded_summarised_24Q2Q3['summarised_text'].apply(preprocess_spacy).to_list()
dim_model = PCA(n_components=5)
cluster_model = KMeans(n_clusters=20)
topic_evaded_summarised_24Q2Q3 = BERTopic(umap_model=dim_model, embedding_model=embedding_model,
hdbscan_model=cluster_model, calculate_probabilities=True)
topics, probs = topic_evaded_summarised_24Q2Q3.fit_transform(df_evaded_summarised_24Q2Q3_list)
topic_evaded_summarised_24Q2Q3 = topic_evaded_summarised_24Q2Q3.reduce_topics(df_evaded_summarised_24Q2Q3_list, nr_topics=7)
# Plot each BERTtopic visualization into a subplot
topic_evaded_summarised_24Q2Q3.visualize_topics().write_html("topic_evaded_summarised_24Q2Q3_topic.html")
topic_evaded_summarised_24Q2Q3.visualize_barchart(top_n_topics=20,n_words=8, autoscale=True).write_html("topic_evaded_summarised_24Q2Q3_barchart.html")
topic_evaded_summarised_24Q2Q3.visualize_heatmap().write_html("topic_evaded_summarised_24Q2Q3_heatmap.html")
topic_evaded_summarised_24Q2Q3.visualize_hierarchy().write_html("topic_evaded_summarised_24Q2Q3_hierarchy.html")
# Load each plot into a subplot
#display(HTML("topic_evaded_summarised_24Q2Q3_topic.html"))
display(HTML("topic_evaded_summarised_24Q2Q3_barchart.html"))
display(HTML("topic_evaded_summarised_24Q2Q3_heatmap.html"))
display(HTML("topic_evaded_summarised_24Q2Q3_hierarchy.html"))
plt.tight_layout()
plt.show()
Topics associated with evaded questions were capital market and growth dynamics (including NII), and financial indicators.
3.2.4 Evasion + NegativityΒΆ
To look at the overlapped topics in evasion and negativity, we create a Venn diagram.
# Create a Venn diagram
keywords_evaded_24Q2Q3 = extract_keywords(topic_evaded_summarised_24Q2Q3)
keywords_phi_24Q2Q3neg = extract_keywords(topic_model_phi_2024Q2Q3_neg)
venn_24 =venn2(
[keywords_evaded_24Q2Q3, keywords_phi_24Q2Q3neg],
('Evaded ', 'Negative ')
)
# Set colors
venn_24.get_patch_by_id('10').set_color('#FFBF00') # Evaded only
venn_24.get_patch_by_id('01').set_color('#00BFFF') # Negative only
# Add a title
plt.title("BERTopic Word Frequencies ")
plt.show()
# Inspect overlaps
intersection_evaded_phi_24Q2Q3neg = keywords_evaded_24Q2Q3 & keywords_phi_24Q2Q3neg
print("Overlap between Evaded and Phi Negative:", intersection_evaded_phi_24Q2Q3neg)
Overlap between Evaded and Phi Negative: {'income', 'rate', 'market', 'modest', 'performance', 'deposit', 'growth', 'potential', 'closer', 'curve', 'ongoing', 'change', 'relation', 'nii'}
These are the most common words in topics discovered by our model (bert) They correspond to the most frequent topics in our analysis of both negative and evaded topics
3.2.5 Regulatory keyword outputsΒΆ
# Function to create a wordcloud (basic)
def create_wordcloud_basic(title, topics):
wordcloud = WordCloud(width=1000, height=600, background_color='white').generate(" ".join(topics))
plt.figure(figsize=(10, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.title(f'{title}')
plt.axis('off')
plt.show()
# Filter the quarters of interest df to be where 'keywords' column is not null or empty
filtered_df_keywords = quarter_df_results[quarter_df_results['keywords'].notna() & (quarter_df_results['keywords'] != '')]
"""
Look at all text in answers where Basel III mentioned.
Create a wordcloud for easy visualisation of common words in these answers.
"""
# Extract and clean 'qa_text' by words
qa_topics = [
topic.strip()
for row in filtered_df_keywords['qa_text'].dropna()
for topic in row.split(' ')
]
# Remove uninformative words
words_to_remove = ['Jamie', 'obviously', 'thing', 'whatever', 'right', 'know', 'look', 'think', 'question', 'actually', 'really', 'still', 'III', 'yeah']
qa_topics = [topic for topic in qa_topics if topic.lower() not in map(str.lower, words_to_remove)]
# Plot wordcloud of qa_text where Basel III mentioned
create_wordcloud_basic('WordCloud of Q&As where Basel III mentioned', qa_topics)
"""
Create filtered dataframe containing only entries where 'keywords' is not empty.
Analyse the Evaded topics for these entries.
Note that all these answers were classed as Evasive.
"""
# Extract and clean the 'Phi-3.5 Evaded Topics
evaded_topics = [
topic.strip()
for row in filtered_df_keywords['Phi-3.5 Evaded Topics'].dropna()
for topic in row.split(',')
]
# Remove uninformative words (e.g., Basel III Endgame) from wordcloud (as the title includes Basel III)
words_to_remove = ['Basel', 'III', 'Endgame', 'Basel III Endgame']
evaded_wordcloud_topics = [topic for topic in evaded_topics if topic.lower() not in map(str.lower, words_to_remove)]
# Plot a wordcloud for evaded topics where Basel III mentioned
create_wordcloud_basic('Evaded Topics where Basel III mentioned', evaded_wordcloud_topics)
# Display list of topics
topic_series = pd.Series(evaded_topics)
topic_counts = topic_series.value_counts()
print("\nEvaded Topics where Basel III mentioned:")
display(topic_counts)
Evaded Topics where Basel III mentioned:
| count | |
|---|---|
| Basel III Endgame | 4 |
| NII normalization | 1 |
| digital banking | 1 |
| Impact on Lending | 1 |
| Proposal Details | 1 |
| Basel III | 1 |
| Capital Requirements | 1 |
| Capital Return and Buyback Trajectory | 1 |
| GSIB Surcharge Calculations | 1 |
| Capital Scenarios | 1 |
| ROTCE | 1 |
| competitive market | 1 |
| Yield curve effects | 1 |
| share growth | 1 |
| 17% capital | 1 |
| GSIB recalibration | 1 |
| SCB and CCAR | 1 |
| CET1 ratio | 1 |
| RWA | 1 |
| Economic environment | 1 |
| Overhead ratio | 1 |
| Cost-Benefit Analysis | 1 |
Evasive Topics: focus on Capital requirements / returns, GSIB, NII, Yield, CET1 ratio.
4 ConclusionsΒΆ
To derive meaningful insight from transcripts, we recommend a solution pipeline that involves identification of texts with specific sentiment or question evasion status, followed by topic modelling using BERTopic on Phi-3.5-summarised text. By focusing on negative-sentiment Q&A in 1Q22, BERTopic highlighted themes like declining Net Interest Income in 55% of texts and geopolitical risks in 27% of texts. In evasive answers, the most common topic (60% of texts) concerned corporate growth, but geopolitical challenges were present too (9%). These findings enhanced our understanding of sentiment and thematic concerns during critical periods.
Emerging risks were identified through analysis of the two most recent quarters, where we observed a decline in the average sentiment, driven by greater neutrality at the expense of positive sentiment. Overlapping BERTopic-discovered themes in negative and evasive Q&As provided valuable insights into areas such as capital market uncertainty and reserve dynamics that might require closer regulatory attention. Detection of keywords related to regulatory changes like Basel III, capital adequacy, and credit loss provisions contextualised these findings, providing actionable insights into emerging risks.
The methodology shows potential for scalability with tailored preprocessing for different transcript formats. Automating transcript preprocessing would enhance adaptability. Generalised models, such as Phi-3.5 and BERTopic, already exhibit strong applicability across financial datasets, but fine-tuning of Phi-3.5 with domain-specific training datasets is likely to further improve performance.